Not all outages are the same. Given that most organizations do not have unlimited IT resources, the severity of an outage to the business must somehow be quantified.
As application complexity has grown, organizations increasingly rely on continuous intelligence (CI) to help reduce outages and speed time to restoration. CI helps assimilate and make sense of traces, logs, and telemetry data that provide insights into app performance. And most critically, they help prioritize issues to get to the root cause of problems that contributed to performance problems and outages.
But not all outages are the same. Certainly, with today’s high user and customer expectation levels, any performance degradation needs to be addressed as fast as possible. However, given that most organizations do not have unlimited IT resources, issues must be prioritized. But how?
Those looking for guidance might look at the work of the Uptime Institute and its Outage Severity Rating classification methodology. The Outage Severity Rating (OSR) was developed by Uptime Institute to help distinguish between a service outage that threatens the business and an interruption that has little or no impact. According to the Uptime Institute, “the OSR provides infrastructure practitioners a common lexicon to use in forming their own service delivery capacity strategies and describing their own outages in terms of business impacts, rather than referencing outages based upon the number of physical infrastructure components involved.”
See also: Continuous Intelligence Insights
The institute noted that while there are different ways to categorize the mission criticality of various systems as a planning tool for disaster recovery and availability/redundancy investments, there is no single metric for measuring the severity or impact of outages.
Such a metric is needed to allocate resources and identify which systems and applications are essential to the business. For instance, the institute noted that while a two-week outage of a human resources system would be annoying, a 5-minute loss of a currency trading system could bring a business down.
Taking such considerations into account, the institute examined publicly reported outages and developed a severity scale. It defines five categories of outages and their impact on the business. It ranges from a Category 1 outage, which has little or no obvious impact, to a Category 5 outage that can result in financial losses, compliance breaches, customer losses, and more.
In announcing its findings, Andy Lawrence, Executive Director of Research at the Uptime Institute, said: “Public awareness of outages is becoming more pronounced as the number increases year over year. In most cases, we find it difficult to understand the true nature and magnitude of the outage since most practitioners still characterize the severity of an outage based on the amount of affected physical infrastructure equipment.”
A final word on outages
Many organizations use CI to prioritize alarms and alerts that impact application performance. They should also pair such insights with the relative impact of any potential problem on the business.
One way to start is to evaluate the impact an outage of a particular application or system would have on the business. The Uptime Institute’s Outage Severity Rating is one way to help organize thoughts and plan accordingly.