DevOps and SRE professionals love the idea of the “democratization” of observability via collectively-maintained projects like OpenMetrics and OpenTelemetry.
Performance and systems monitoring has been around for decades at this point, but the industry is far from feature-complete. It’s grown enormously more complex in the last few years. High demands from DevOps and SRE communities means these platforms now collect more metrics, with higher granularity (often as low as every second), and offer more user-customizable dashboarding than ever before. Some are even peeking into the Linux kernel using eBPF technology, which offers unprecedented detail into the performance of mission-critical applications.
And while monitoring—the ability to watch and understand the state of a system, application, or infrastructure—has grown in complexity, a second pillar of observability has grown up alongside it. In contrast, observability tools allow teams to investigate and debug their systems actively. A monitoring tool is like a static dashboard, whereas an observability tool is a compass for exploring metrics, traces, events, and logs.
Think of it as the difference between asking, “What’s going on here?” and, “Why is this happening?”
One of the biggest challenges in setting up these monitoring/observability tools is interoperability. Collecting data from all your disparate sources via functional collectors is one problem, but so are the proprietary data storage formats that many of these tools have implemented. If DevOps and SRE teams can’t easily integrate data between different tools, then they’ll never get the single-pane-of-glass dashboards they demand.
Enter OpenMetrics and OpenTelemetry, two collaborative, open-source projects that are part of the Cloud Native Computing Foundation (CNCF). OpenMetrics is a sandbox project, meaning that it’s ideal for innovators and those who love to tinker with technology. OpenTelemetry has matured into an incubating project—being used successfully in production, a healthy number of diverse committers, and a “substantial ongoing flow of commits and merged contributions.”
See also: What is Opentelemetry?
OpenMetrics bills itself as the “de-facto standard for transmitting cloud-native metrics at scale, with support for both text representation and Protocol Buffers.” In other words, it’s a format for storing monitoring data in memory for streaming analytics or in a time-series database for historical batch analysis.
OpenMetrics is based on a specification that started in the open-source Prometheus project, a monitoring platform, and time-series database. This text metric exposition format, called the Prometheus exposition format 0.0.4, has kept a stable specification since 2014. That’s ancient times in the observability space, and yet Prometheus is by far the most widely-supported format for collecting and storing metrics from mission-critical services and applications, like web servers, databases, messaging queues, and more.
The goal of OpenMetrics is to convince the vendors of services and applications—regardless of whether they’re open-source or proprietary—that they should emit metrics data in a specific format over HTTP. Whatever happens on the other side, from storage to analysis to dashboarding, isn’t of OpenMetrics’ concern. It’s vendor-neutral and designed to help many people work together to improve the reality of a shared, ongoing problem.
OpenTelemetry is “high-quality, ubiquitous, and portable telemetry to enable effective observability.” That’s an important distinction—OpenTelemetry goes beyond metrics data, attempting to standardize how applications, platforms, and services forward all telemetry data (metrics, logs, and traces) to backend storage/analysis systems.
While the project is still only in beta phase, it’s already released a robust collection of tools and SDKs in Java, Go, Python, and more to help development, DevOps, and SRE teams observe their systems with new depth. The project results from the merging of two previous projects, OpenTracing and OpenCensus, the latter of which was built and sponsored primarily by Google. Due to this heavy industry backing, there are already hundreds of collectors and exporters that help you send telemetry data to any observability platform that supports OpenTelemetry.
The key here is that interoperability is everything. That’s the incentive behind these two projects working together to make their open data formats interoperable with one another.
The same goes for the monitoring/observability vendors, who want to prove that they’ve modernized away from traditional monitoring products, which were heavily proprietary and frequently impossible to integrate with other platforms. Grafana heavily supports OpenMetrics via its management of the Prometheus toolkit, and OpenTelemetry has big backers in New Relic, Sumo Logic, Honeycomb, Dynatrace, and more. By supporting these projects financially and through code contribution, they reject the threat of vendor lock-in and seek to make DevOps and SRE teams feel comfortable adopting their platforms since they have the reassurance they could eventually take their observability data elsewhere if need be.
These parallel moves between interoperability and industry investment will inevitably make for a complicated future. DevOps and SRE professionals love the idea of the “democratization” of observability via collectively-maintained telemetry projects like OpenMetrics and OpenTelemetry, but vendors are always looking for an exceptional edge over the competition—all the exciting new areas of observability that will inevitably stay behind closed-source lock-and-key.