What Your Robots Aren’t Telling You: Why Today’s Observability Tools are Failing the Field

PinIt

For robotics applications to achieve high availability and reliable performance, the industry must first establish much higher standards of observability.

Over the past few years, “observability” has garnered a great deal of attention within certain circles within the tech industry. The conversation encompasses the term’s meaning and origins, its appropriation by technical marketers, and whether or not it’s distinct from related concepts like application monitoring. And yet, despite the breadth of discussion, “observability” seems to have settled its roots in a rather narrow patch of ground — namely, backend distributed systems, and particularly microservices.

See also: Researchers Develop New Framework To Teach Robots

The advent of the internet and its sprawling architecture of distributed systems has led to a Cambrian explosion of companies, tools, and platforms dedicated to supporting those systems. As the complexity of this landscape grew, so did the challenges we faced in our efforts to monitor and understand it. It was in this landscape that the term “observability” was first utilized in the context of modern tech.

Today, however, the world of distributed computing is pushing even further, to the edge; it’s stretched beyond the limits of the data center and into our immediate physical surroundings. What was once a concept exclusive to the world of backend systems is now equally applicable and equally important to the field of robotics. 

Observability 101 

Search “observability” on Twitter, and you’re likely to find some detractors deeming the term “marketing speak.”  But, believe it or not, the idea of observability wasn’t an invention of Silicon Valley marketers. In fact, it was coined all the way back in 1960 by the renowned engineer, mathematician, and National Medal of Science recipient, Rudolf Kálmán. 

In its original conception — as a new measure within the field of control theory — observability was defined as such: 

“Observability is a measure of how readily a system’s internal states can be inferred from knowledge of its external outputs.”

Clear, concise, and broadly applicable, the last of which is particularly important to our current discussion.

Observability for All, Including Robots (Especially for Robots)

When viewed through the lens of its original definition, observability’s relevance to robotics becomes rather evident. If a corporation deploys a robot (or 1,000 robots), that corporation wants to ensure that its new (often very expensive) assets succeed (and survive) in their new roles. And in order to achieve that goal, observability is essential.

Observability in the context of robotics is, in many ways, similar to observability in the context of distributed computing. In both cases, observability aims to achieve the same fundamental goals. Both use observability to measure, analyze, introspect, and increase one’s overall understanding of an operational system. The difference is, in robotics, the “system” is made up of robots, operating either independently or in concert, and deployed in literal, physical space. Still, both robot operators and DevOps engineers use observability to many of the same ends, such as diagnosing failures, iterating on systems design, and supporting automated processes.

Despite these fundamental similarities, however, the world of web services has enjoyed decades of investment into microservices-oriented observability tools while the world of robotics has received a relative pittance. 

For robotics applications to achieve the same high availability and reliable performance we’ve come to expect from web services, the industry must first establish much higher standards of observability. But, doing so is no easy task. 

Nothing Comes Easy in the World of Robotics

The most significant difference between robotics and more traditional domains of observability is complexity. In virtually every aspect, observability in robotics is more complex and more challenging. 

Principle among those aspects is the environment. In the world of robotics, observability must overcome the challenges of an unbounded system. The physical world is a continuous and high-dimensional domain. It presents a near-infinite amount of potentialities, most of which are unforeseeable. What’s more, failure modes often bridge the hardware/software divide. 

To help illustrate the unique challenges of operating in the real world, consider some of these scenarios we’ve heard from customers:  

  • Localization failure: A mobile robot fails to localize due to changes in an office floor plan.
  • Vision failure: An object classifier fails due to inadequate lighting in a warehouse.
  • Sensor calibration failure: A pose estimator fails due to changes in camera extrinsics after a collision.
  • Unmodeled human interaction: A sidewalk delivery robot is blocked by a group of protestors.
  • Unmodeled environmental interaction: An inventory scanning robot is unable to navigate past a spill in the juice aisle.

From spilled Snapple to political action, robots encounter a multitude of “unknown unknowns” in their day-to-day operations. And as robotic technologies continue to advance, so too will the complexity of their roles and environments. 

In every one of the above scenarios, today’s standard, web-services-oriented observability tools would be woefully inadequate. Failure modes in robotics are difficult to predict, difficult to observe, and as a result, difficult to remedy. What’s more, robotic deployments are typically high-ticket, high-risk investments taken by businesses in competitive markets. 

So, the opportunities for failure are manifold, the costs associated with their resolution are high, and the overall business implications are significant. I believe it’s safe to say that robotics is in need of a new class of observability tools, one built specifically for our industry, and the unique challenges we face.

What a Robot Needs to Succeed

When developing an observability platform for robotics, there are a number of unique requirements and considerations one must account for. First and foremost among those considerations is the need for ad-hoc debugging tools in order to navigate the unavoidable “unknown unknowns” we discussed. 

The unpredictability of unstructured and semi-structured environments makes it so any observability platform meant for robotics must go well beyond the simple monitoring of known failure points.

The standard pillars of observability — logs, metrics, and traces — are necessary but nowhere near sufficient. Context from the physical world is also needed, which can come in many different forms: maps, point clouds, geolocation, poses, video, odometry. Such visual and geometric output also necessitates adequate support for these unique data types.  Finally, robustness and versatility are required to overcome less than ideal network conditions in the wild.

In addition to reliable, multi-faceted sensor data, equally important is the associated metadata. An observability platform for robotics must accommodate multiple dimensions of high-cardinality metadata (software & hardware version, experiment ID, customer ID, site ID, etc.) and top-notch support for time-related dimensions. In addition to these requirements, we can also add:

  • Integrations with common robotics protocols and platforms
  • Robustness in the presence of unreliable network connectivity
  • Visualization and proper semantic interpretation of robotics data types
  • Ingestion and playback of historical data
  • Anomaly detection
  • API extensibility
  • Compatibility with the broader observability ecosystem

A Platform That’s Greater Than the Sum of its Parts

That’s a whole lot of features and capabilities to consider, none of which are exactly a breeze to implement. An effective robotics observability platform must not only check every box on this list but bring all those features together into a single, unified toolset. A toolset which should provide visibility and actionability in equal measure, and offer a UI friendly enough to make quick decision-making and action-taking possible. 

It’s a stupendously tall order. But, thankfully, there’s hope. A small number of forward-thinking companies are just now crossing the horizon to take on the challenge of robotic observability. As is often the case in the world of tech, major innovations are often much closer than they appear. 

Ian Sherman

About Ian Sherman

Ian Sherman is Head of Software at Formant, a company building cloud infrastructure for robotics. Prior to Formant, Ian led engineering teams at Google X and Bot & Dolly. The through line of his career has been tool building, for engineers and artists alike. He’s inspired by interdisciplinary collaboration of all types; currently this takes the form of applying patterns and practices from distributed systems operations to emerging applications in robotics. Ian earned his BA in Computer Science from Brown University.

Leave a Reply