Why Digital Transformation Drives the Need for AIOps

Due to the complexity of modern apps, the industry is undergoing a shift away from separate network and application monitoring tools towards AIOps.

Infrastructure and its impact on application performance are ever-so more important today as companies deploy new applications and undergo digital transformations. Unfortunately, the complexity of the underlying infrastructure of modern apps makes troubleshooting and problem resolution much harder. Traditional monitoring tools fall short. What’s often needed is an approach to problem identification and rapid resolution based on AIOps.

In the industry, there is some debate about what AIOps truly means. Some implementations work on pre-defined rules and intelligently take actions based on whether certain conditions are detected. For example, if a compute-intensive application’s performance degrades, shift more data to solid-state drives, fire up more compute instances, and throttle back the bandwidth consumption of other applications. A pure AIOps solution would not need the rules to work. An AIOps solution should automatically discover the relationships between status data and the business outcome. (Under a rules-based system, the same amount of setup work is needed as in many manual systems.)

There is also a difference between monitoring and management. AIOps should provide insights instead of the human user looking at data and then sorting out what is going on. The AIOps tool should tell an IT manager that there is something that needs attention. The goal for the automation AIOps provides is to reduce the time spent manually intervening and allow for more time with applications.

AIOps in action

One way to look at AIOps is in terms of how it is different from other monitoring and application performance management approaches.

Take, for example, a customer having a poor experience when trying to complete an online transaction. A slow performance or interaction could be due to a number of sources. The broadband link the customer is using might be slow, the Internet backbone over which the transaction’s packets traverse might be congested, the primary application server might be under strain from too many simultaneous sessions, a secondary app (e.g., a CRM system that pulls existing customer information to help complete the transaction) might have slow response times, or a third-party database (e.g., a credit check system) might be offline.

The traditional approach to application performance management would be to wait for an angry call from the customer about the poor quality of their transaction. An Ops team might then use troubleshooting tools to try to identify the problem. And then make changes (perhaps increase the app server’s capability).

A more proactive approach would spot the customer having a problem and take corrective actions in real time. For example, an Ops manager might allocate more bandwidth to the CRM system to speed that part of the transaction.

Both approaches are labor-intensive and require that the Ops team sort through many logs, traces, alerts, and other data from a slew of disparate systems. They must somehow aggregate this data, correlate it, and try to make sense of it to identify the root cause of the problem.

Given the complexity of modern apps, this is not a practical approach. AIOps platforms combine traditional monitoring tools with streaming telemetry and analyze all of them using AI. AI analyzes each data source and correlates multiple anomalies to automate the identification of problems while also providing detailed information on how to fix the issue. Thus, if an AIOps platform is properly implemented, not only does it provide more visibility into potential problems, but it also eliminates many manual troubleshooting and remediation tasks.

To that end, an AIOps tool should provide insights instead of the human user looking at data and then sorting out what is going on. The tool should tell an IT or OPs manager that there is something that needs attention. The goal: AIOps provides automation to reduce the time spent manually intervening and allow for more time with applications.

A final word

Modern digital businesses need AIOps tools to enable continuous insights across an IT stack. Such insights are increasingly important as the systems that need monitoring and management become more complex, more distributed, and more out of the tight control afforded when everything was on-premises.

In particular, modern apps make it harder to understand the cause of performance and reliability problems. While more monitoring and alerting capabilities are great, they can add to the workload of an already busy IT and Ops staff. That is why the industry is undergoing a shift away from separate network, application, and device monitoring tools towards artificial intelligence (AI) for IT operations, or AIOps for short.