Podcast: Understanding Continuous Intelligence and Its Importance in Complex Systems

RTInsights editors Joe McKendrick, Lisa Damast, and Sal Salamone discuss why continuous intelligence is needed to address the complexity of modern applications and operations.

In this RTInsights Real-Time Talk podcast, RTInsights editors Joe McKendrick, Lisa Damast, and Salvatore
Salamone talk about how increased complexity is impacting operations and applications, why traditional monitoring and management tools fall short, and how AIOps, and other solutions based on continuous intelligence can cut through the complexity and improve operations and security.

About the Continuous Intelligence Insights Center:

From real-time fraud prevention to enhanced customer experience to dynamic energy load balancing, businesses of all types and sizes are realizing the benefits of Continuous Intelligence, helping them make decisions in real-time while events are happening.

Where do you begin, though?

What are the key requirements?

RTInsights’ Continuous Intelligence Insights center, sponsored by Sumo Logic, brings together the latest insights and advice on continuous intelligence to answer these questions and more.

Read the podcast transcript:

Damast: Hello and welcome to the latest episode of Understanding Continuous Intelligence, an RTInsights Realtime Talk Podcast in partnership with Sumo Logic. I’m Lisa Damast, Senior Editor and Head of Marketing at RTInsights.com and your host for today’s discussion. Joining me are RTInsights Editor-in-Chief, Sal Salamone, and RTInsights Industry Analyst Joe McKendrick.

Download Report: Why Continuous Intelligence? Why Now?

In this episode, we break down why CI is needed to address the complexity of modern applications and operations.

Joe, kick us off. Why is there so much complexity in modern applications and operations?

McKendrick: Okay Lisa, thanks. Where do I begin with that? Well, our enterprise systems, the systems we depend on to run our businesses, are more complex than ever. I mean, the complexity is only increased exponentially for several reasons. There’s several things going on.

First, there’s still a lot of legacy systems out there. There are mainframes, it’s estimated there’s at least 10,000 mainframes, and that’s about 80% of the Fortune 500, for example, still run their operations on mainframes, not to knock mainframe System z, beautiful machine. IBM keeps it up to date, but it’s all part of this legacy infrastructure.

There’s Windows Server Farms out there, still legacy. You can call them legacy systems.

Add to that cloud. We’re moving into the cloud. A lot of companies are adopting clouds. Multi-cloud is a huge thing. A recent survey found that, for example, only 3% of companies report using one single cloud. The rest use multiple clouds for multiple types of applications and requirements.

So you have that layered on top. The push is on to digitize, to digitize everything they do and digitize their relationships with partners, their supply chains, meaning there’s a lot of activity inside the walls of the enterprise and outside the walls of the enterprise.

A lot of things are going on with IoT, for example, in edge computing, partners, customers, products, products themselves, sensors, and so forth, feeding in data into these systems. So there’s a lot you have to keep an eye on. It’s not like the old days where you had this server room with these servers, and you just had to manage and keep an eye on what’s happening in the server room. It’s now millions of server rooms everywhere.

Salamone: That’s what’s leading to some of the problems we’re seeing. Everything out there, every element, every service, is generating data to help you understand what’s going on. There are logs, traces, alerts from every system and device. This data can be overwhelming and really hard to assimilate. It’s one thing to get an alert, some sort of low-level alert that some server is at very high memory utilization, but it’s another thing to relate that back to say, “Oh, this major application is having poor performance. What is it? The connectivity? Are there too many users?”

To tie that one thing back is really hard, and there’s just too much data. And the same is true in terms of not just keeping things up and running but in protecting them on the security side. There are so many indications that may be a tip-off that something’s being breached, but you can’t connect all the dots. Typically with the existing tools out there, they’re all siloed, and you get overwhelmed with the number of information to help you understand what’s going on. So that’s one thing.

The other thing is, with these complex applications today, there are so many hidden interdependencies that make it hard to sort out and understand what’s going on. There’s a great example at the start of the pandemic, a weather forecasting app, the national weather forecasting app, the short-term forecast quality declined significantly, and they just couldn’t figure out what it was. And it turned out that one of the big feeds, one of the big data sources for this was temperature wind data, pressure data from commercial airlines that automatically got fed into this model. And here you have this case where airline traffic shut down.

So here’s this major model suffering in quality, but to tie those dots together took quite a bit. So that gives an idea of the complexity and how it impacts things.

McKendrick: And I might add too, operations people may be well aware of this, but if you go higher in the ranks of the organization, there’s an assumption that by moving to cloud, you’re outsourcing a lot of these complexities, a lot of these issues with monitoring and interconnections and keeping track of the log data and things of that sort.

There’s an assumption, again, as you got higher up in the organization, there’s an assumption that it’s turned over to the cloud. We don’t really need to do that stuff anymore. And that’s wrong.

Even AWS, it’s a huge platform. You still need to keep on top of what’s going on with your applications and your functions running out on AWS or Microsoft Azure, or what have you.

Damast: Great. Great points. So how does all this complexity impact operations and security? Joe?

McKendrick: Well, again, and Sal made a great point of this. The visibility is very difficult. It’s very difficult to get a sense, a holistic view of what’s going on across all these various systems and all these layers of systems that you have out there. And Sal, you just actually put together a great RTInsights article on AIOps. In fact, they had a couple of articles I saw running over there talking about the importance of AIOps.

There needs to be a way to automate this process, to automate your ability to observe what’s going on, your ability to apply security, to apply reliability across your systems.

Salamone: I think there have been some great examples in the last year with major problems, both on the operation sites, keeping things up and running. Both Facebook and Amazon had outages, and the sources were so deeply embedded.

And with Facebook, it was a routing protocol misconfiguration, and Amazon, it was something that auto-scaled up in capacity, ended up causing some interactions internal between other elements that generated a flood of network connectivity traffic that overwhelmed the networking devices.

So, you figure these folks have the huge high-end tools to do it, but because of the complexity, you still can get caught, I think is one of the things that we’re finding. So you really need some insights into what’s happening.

And the other, on the security front, I mean, look at this with the Log4j problem, that’s really happening worldwide now. This software, the libraries that use it, are widely embedded into so many applications people have developed that companies don’t even know where it is. They’ve reused components, they’ve created applications out of libraries, and they’re having trouble finding even where it is. So there’s quite a bit that can cause problems because of the complexity.

McKendrick: And I’ll quote my friend, Andy Thurai from Constellation Research. When it comes to managing uptime across complex interconnected systems, hope is not a strategy. Maybe for our personal lives, it’s a great thing, but when it comes to understanding what’s going on across these various integrated, interconnected systems, you need visibility. You need visibility, and you need help from automation.

And I want to add here as well, you need to adapt to the culture. The culture is a huge part of this, the culture of the organization.

There’s been a lot of talk about DevOps, site reliability, engineering, things of that sort. And it’s not something that just appears magically as you need it. You need to develop an understanding. I talked about higher-level executives. You need to increase. You need to understand that you need to do something about this mentality that has existed out there, that IT is a cost center.

It’s something that you need to maintain and grudgingly give some budget to, but it needs to be more than that. It needs to be something. You need to develop a culture and awareness about continuous monitoring or continuous intelligence and what’s going on across your system.

It’s something that affects the business. The business very much has a stake in what goes on. Sal talked about the outages with AWS, for example. Just about everybody seems to rely on AWS for their business. It’s not a blip that happens in the IT department that gets taken care of any longer. It’s something that dramatically impacts the viability of the business.

Damast: Joe, can you elaborate on how continuous intelligence, the importance of it, and how you bring this all together?

McKendrick: Well, there are several layers, I guess, you can look at. In continuous intelligence, it’s based on observability. You need to have observability across your systems, and this is especially important as AI develops, as AI becomes a greater part of your operations, the way you do business because you need to understand the inputs and the outputs that are taking place across these systems, be it AI, or some kind of advance analytics that your business is relying on. So you need that observability to understand not only what the output is but also the data, the viability of the data, the viability of the systems that are providing the data.

So you need this holistic view of what’s taking place across your organization. And automation can play a key role in this as well. These are all processes that could eventually be automated and therefore alleviate your operations people, or your DevOps teams, of a lot of the so-called toil that affects their jobs and enables them to not have to worry about all this stuff going on underneath, and look at it from a higher level, at a level that impacts the business.

Salamone: Just to add to that, I think there’s been this great shift. The focus on continuous intelligence is part of a bigger shift where we’ve been moving away from passively monitoring things to more proactive observability, I guess, is the new word. Trying to get some understanding of what’s going on between all these interdependent components and elements in a complex operation and then applying some intelligence, some AI to that, to try to make sense of it.

And I think where CI is really coming into play, continuous intelligence really coming into play, is not so much diagnosing problems but prioritizing things for the human to step in. So you may be flooded with tons of low-level alarms, but all marked critical. Or not, but so many alarms, but how do you tell which is the thing to act on? So you need that to help resolve, to speed up the time it takes to resolve these performance problems, and to prevent attacks. So that’s where we’re seeing CI come into play.

McKendrick: And I always thought, one of the paradoxes of AI is that the purpose of AI is to reduce the need for human labor. Not talking about replacing human labor, but reducing the need for human labor and elevating people to higher-level tasks. But at the same time, you need more human labor to build and maintain and manage these systems, these AI systems, and to automate the automation, is a key element of what we need to do here.

Salamone: And what you’re seeing is CI is being applied in different application areas, including cloud performance management, application performance management, and security management. Now all of these fields have their own tools and have relied on them for many years. And this is just something maybe like a higher-level thing that can be used to help make sense of this bevy of tools that you have, the information that’s coming from all of them.

Damast: These are great points, Joe and Sal. Thanks for breaking this down for us today. For more information on continuous intelligence, visit the CI Insights hub on RTInsights.com.

About the Continuous Intelligence Insights Center:

Read the podcast transcript:

Recommended Articles

Leave a Reply Cancel reply