Observability in Action: A Conversation with Helen Beal (Video)
James: Greetings everyone. Welcome to the video series, Observability in action, brought to you by Moogsoft and RT Insights. I’m Jim Connolly. I’m an editor with RT Insights, and with me today is Helen Beal. Helen is a strategic advisor to Moogsoft and a DevOps and Ways of Working coach. Welcome, Helen.
Helen Beal: Thank you very much, I’m delighted to be here.
James: Thank you for joining us. What is observability in an IT environment?
Helen Beal: So observability is a system characteristic. So it means that systems are able to be observed. You need a kind of conducive architecture in order for those systems to be observed in the first place, which means you need a culture that has the behavior of intending to build observability into those systems.
James: Well, how does it influence business outcomes?
Helen Beal: So, a key thing observability does is allow us to understand when something’s going wrong. What we’re really trying to say is that what people want to do is they want to create a sublime customer experience and the way that they’re going to do that is by getting things to them faster and making sure that they work properly. So what we need to do then is put ourselves in a position that when things go wrong, we have the best hope of fixing those problems as quickly as possible.
And this is where observability becomes important because that provides us with the data that allows us to understand what’s going on in our systems and allows us to make interpretations, get insights and decide what actions we want to take to remediate the problems.
So the key thing is to stop the system going down, or if it goes down to get it back up and running as quickly as possible. A secondary effect of that is that all that time that we’re working on that incident, we call that unplanned work, all that time we’re working on unplanned work means we’re not working on our planned work and our planned work generally is going to be the things that are going to be differentiating features.
There’s a third one though, that we often don’t talk about, which is that we have instability in our systems. We have something called technical debt, it builds up over time. It happens when we do things like we allow known defects and things to pass downstream when we’re cutting corners. We know they’re not quite right, but we never go back and fix it.
James: So who in an enterprise organization should care about observability?
Helen Beal: It’s definitely more of the technical teams priority, I think, than the CEO. A CIO should absolutely be interested in this and anyone working on the frontline is going to be interested in this as well. So the CIO will almost certainly have targets around performance, they’ll be being asked to report on older metrics around things like the amount of downtime, probably MTTR as will be our ITOps teams, the DevOps teams themselves would probably be concerned with MTTR. I think pretty much everyone in the technology team should have this culture of building systems that have the capability to be observed within them.
James: What’s it take to adopt and adapt observability for your organization?
Helen Beal: So as I said it’s about what I call a conducive architecture and by that, I mean, an architecture that understands it needs to be observable. So it needs to be embedded in the architectural level and then at the build levels. So observability is a characteristic of the system that it is observable. The human still then has to observe it. So we will do that by using mostly monitoring tools and most organizations will have lots and lots of monitoring tools, monitoring different things.
The problem we have with monitoring right now is that we’ve got so much of it, as I said, most organizations have lots and lots of tools and we have lots and lots of data. Additionally, most businesses, 80% or more of the data that we’re creating is unstructured.
So in the context of observability, that unstructured data is things like log files it’s where we’ve got lots of text. We’ve created observability and telemetry everywhere, which is marvelous but we’ve just created ourselves waves and waves and waves of more dataSo we have our poor ITOps people and DevOps teams, that are now overloaded with data overloaded with alerts from monitoring systems.
Helen Beal: So there’s a distinct phrase we use alert, fatigue. So people have more data than they can possibly know what to do with. And this is where things like AIOps really step in.
James: Now I understand that there’s neuroscience behind observability. What role does it play?
Helen Beal: According to biologists have some of the most powerful brains on the planet, but they’re still not big enough to handle this incredible amounts of data.
Helen Beal: So it’s just really difficult, impossible for us to process these huge volumes of data using our own brains. And then there’s a couple of other bits of neuroscience to come into play. And one is that, what happens to our brain when we’re under pressure.
If we feel anxious at the back of our brain, that’s where our fight or flight system succeeds, just a thing in the back of our brain that we call the amygdala. And what happens when we’re very anxious, that amygdala kicks in and it’s fight or flight. And what that does is it interferes with that working memory in our frontal lobe.
Imagine we’re in our IT environment and our key system goes down and we don’t know why it’s gone down. Not only are we compromised in terms of how we can operate, we also don’t physically have enough brain to interpret all of that data that we’re receiving.
What helps us process it is AIOps. So by adding algorithms over and above the state, that can do things for us, like reduce the noise a lot of those alerts that we’ll be receiving will be saying the same thing. So we can use AI to weed out all of those repeated messages. And then we can do correlations as well. So the AI can start looking for things that it’s very difficult for us to see quickly about things like, whether there’s a pattern between the typology, for example, of where that thing is coming from as to enter the application.
James: How does observability drive AIOps beyond incident management?
Helen Beal: Now we can leverage the observability and AIOps to give us insights as to where these problems are coming from. So now we’ve got our extra time. We can start doing things like automating repeat problems out of the way and paying down technical debt. Once we’ve paid down the technical debt and done some automation, just to kind of keep ourselves stable, we’ve released even more time, we can move to another layer in our capability model, which will be about automating fixes.
And then the next level, once we’ve got all this time available to us is actually start building on that anti-fragility. So what I mean by that is using techniques like Chaos Engineering, which is effectively a way a controlled way of breaking our system on purpose in order to practice, fix it, fixing it and finding problems we didn’t know about. If you think about how pilots, for example, spend a lot of time in simulators, practicing what to do when things go wrong, you can start to understand how this gets us to a point where we’re really reducing the chance of incidents and the impact of incidents.
James: Looking to the future, can AIOps now lead to AI development, AI, AI DevOps AI observability.
Helen Beal: I think there’s a huge amount of different applications for AI throughout the DevOps tool chain. We can do things in development like help developers know what code snippets to look for. We can help automate which tests are going to run and automate interpretations of the feedback from tests. We can do all sorts of things around service desks in terms of automating the way that tickets are understood. The one that really excites me though, in terms of where we are in the DevOps tool chain. And it starts when we have the idea and that idea in a DevOps tool chain will sit in things like portfolio management and product backlogs. And it stops when that idea is realized by the customer in fact, it doesn’t really stop because when that value is realized, we receive more data about what the customer has experienced, and we should use that data as feedback and feed it back into our ideas cycle.
We can leverage the data that these observability tools or these observability platforms are providing, and interpreting it by AIOps, around customer experience. Traditionally, when we look at value outcomes, it’s a really fuzzy area of IT at the moment. We say things like value is in the eye of the beholder. They probably don’t realize they’re doing it, but they’re thinking of two different types of value.
Helen Beal: They’re thinking of business value and customer value. Business value is when we talk about things like profit and revenue and that’s, you know, that’s how we often measure a business., but that’s what I would call a lagging or an indirect indicator or metric about what’s really going on at the customer level. We know if we’re profitable, we’re probably doing something right with our customers, but we’re not getting that information directly from the customer’s mouth.
Helen Beal: So things like the actual data telling us how customers are behaving. So things like session length, bounce rate, basket sizes, there’s a lot of different data that teams could be collecting and using algorithms to give them leading indicators on whether the thing that they have running is working.
James: Helen, all of this is great information. I want to thank you so much for sharing it today with our audience, and please enjoy the rest of your day.