3175 Dynatrace Perform 2025: The Future of AI Observability and Automation
Tech Talks DailyFebruary 09, 2025
3176
29:3219.54 MB

3175 Dynatrace Perform 2025: The Future of AI Observability and Automation

In today's fast-moving digital landscape, IT teams are under immense pressure to maintain performance, security, and reliability while managing increasingly complex cloud-native environments. But as traditional monitoring tools struggle to keep pace, AI-driven observability is emerging as a game-changer.

In this episode, I sit down with Alois Reitbauer, Chief Technology Strategist at Dynatrace, to explore how AI and automation are redefining enterprise IT. Alois shares his insights on the role of predictive AI, AIOps, and automated observability in helping organizations proactively detect and resolve issues before they impact users.

We also dive into how Dynatrace is integrating AI-powered solutions to enhance performance monitoring, security, and cloud automation, making IT operations more efficient and resilient. Alois breaks down the latest innovations, including how AI observability supports large-scale cloud environments, reduces alert fatigue, and enables self-healing IT ecosystems.

As AI continues to transform enterprise technology, what does the future hold for IT teams? Can AI-powered observability help businesses scale without adding complexity? And how can companies harness Dynatrace's advanced AI insights to drive greater efficiency and security?

Join us as we explore these questions and uncover the latest breakthroughs shaping the future of IT operations. I'd love to hear your thoughts—how do you see AI observability changing the way businesses manage their digital ecosystems?

[00:00:04] What does the future of AI-driven observability look like? And how can businesses harness its full potential beyond just monitoring? Well today I'm thrilled to be welcoming back my good friend Alois, Chief Technology Strategist at Dynatrace. He's been on the show two times already in the last five years.

[00:00:24] But today I get a chance to sit down with him and complete his hat trick of appearances, but in person this time at Dynatrace Perform in Las Vegas. And we're going to cover everything from innovation in AI and observability. And for anyone that missed our previous conversation, my guest today, he spent his entire career advancing monitoring tools, optimizing application performance, and helping businesses navigate the complexities of IT operations.

[00:00:52] But now we're moving way beyond monitoring, because as AI continues to reshape business observability, my guest today is at the forefront of Dynatrace's strategy, ensuring enterprises can move from problem detection and more towards autonomous remediation. So as I sit here at Dynatrace Perform, where AI and automation are taking center stage,

[00:01:17] it's time to dive into how Dynatrace is not just integrating AI, but making it an integral part of an IT strategy. So how is AI transforming observability from a reactive tool to an active driver of business resilience and efficiency? Let's find out now. So a massive warm welcome back to the show. We've spoken many times over the years, but this is the first time we get to do it in person at Dynatrace Perform.

[00:01:46] But can you remind everyone listening who missed our previous chats a little about who you are and your role at Dynatrace? Yeah. Hello, I'm Aris Reitbauer, Chief Technologist Strategist here at Dynatrace. I lead our AI efforts and also everything related to data ingest in Dynatrace, as well as our research department. So basically everything that's fun.

[00:02:06] Well, here we are in 2025 with three years into the AI hype that we've seen building and we are beginning to see maturity and measurable value from the technology now. So I'd love to learn more today about Dynatrace's AI vision. So just to set the scene for the conversation, can you tell me a little bit more about how AI has become integral to Dynatrace's operations, solutions and that mission towards business observability too? Yeah. Yeah.

[00:02:33] For Dynatrace, the journey to use AI started over a decade ago when we realized we were constantly training people how to analyze performance problems and how to find the root cause. At some point we just thought that can be it. If we can teach it and we can teach it to people in two days, there must be a way we can automate it. And the idea was really more about automation. And then we said, okay, we need different components of AI to do this.

[00:03:01] We need obviously predictive AI to just figure out what is broken. Like one of the key questions when you look at AI, but then we said, okay, now we have everything that's broken. And that still is a very big mess because it's like this very long list of things that do not work that don't help you in the resolution process.

[00:03:17] So we also need to figure out how are these things interconnected and how can we differentiate between impact and root cause, which then led to the creation of our causal AI engine where we combined causal and predictive AI in what is known as the Davis engine, which has been a core and integral part of Dynatrace for, as I mentioned, like over a decade right now.

[00:03:40] And it helped customers simply to solve their production issues faster, avoid false positive, and eventually also supported their quality of life because usually you're called into a production incident when you're in something else. It's a very stressful situation. That was really our initial journey to AI. And then with the rise of Generative AI, we realized, okay, this provides another two very great additions to what we're already doing.

[00:04:09] Number one is it provides a new interface, a non-expert interface to the product, or it just helps you in your line of thinking. When you're trying to solve a problem, it's much easier to formulate a questionnaire through language than to write it in a query or to think, okay, where do I find this data? How can I do this?

[00:04:25] And also enriching more and more context in written format, like on a problem, or if you find an issue somewhere, plus eventually leading into remediation where you take some infrastructure as code, tell Genii how it should be modified, and then automatically committing it to Git repo or even deploying it to production in an automated or semi-automated fashion. And I'd love to dig a little bit deeper on this.

[00:04:50] So can you also expand on the company's overarching AI strategy and how it ultimately aligns with a mission to simplify IT complexity and drive innovation? And also there seems to be this pivot or move away from monitoring to this business observability, which seems to be a key thing. Yeah, I think we can't really simplify complexity. Complexity is just there, and I think it's even increasing more. Now we see the rise of what I refer to as like AI native applications.

[00:05:18] So now it's like these microservices of smaller AI models that interact with each other dynamically, so complexity will still increase. But overall the goal is to, how I like to call it, is really build applications that are self-healing, so that can auto-remediate in most of the situation that are self-securing, that can do pretty much the same from a security perspective,

[00:05:40] and also are self-optimizing, so reduce cost and identify areas where you can improve or optimize some of your IT landscape. So I think overall IT should move from, okay, I'm showing you on the dashboard what your data is, then we transition into, okay, I show you how you could potentially improving it. And the vision is to get active proposals, to get proposals, how to change it, that you can either accept or modify so that it works.

[00:06:10] And taking really the burden out of operations on all of these tasks. Eventually, obviously the idea is that there needs to be no more human interaction. I think we're still quite a bit away from this, but that's the overall idea. You really focus on the value creation and not so much on keeping things up and running, which we all know, like even from everyday life. It's easy to build a house, it's way more work to maintain a house.

[00:06:36] And you rather want to focus on the building and then obviously the living than the time you have to build and maintaining it. And that was how business comes in there. Also taking business as a key input or business in this case really means what is driving your business, what are key business functionalities. I wrote a new words this week actually. It is called the minimum viable enterprise. I don't know whether you know what it is and what it is.

[00:07:00] So the minimum viable enterprise is obviously a concept that is used by companies that specified what are the key systems or the key business workflows that need to be up and running so your business works. Basically, if these things stop working for a longer period of time, and this might just be depending on like hours or days, you're basically done as a business.

[00:07:27] And companies start to realize that they have to look at these business processes. And the big difference looking at a business process and looking at an application is that a business process often is more than one application. It might be multiple applications. So I think you order something from an online store. If you look at the entire process from clicking the, or going to the website, eventually clicking the button, it being shipped and delivered to your house. There's a number of applications. That's the front store application. Then there's the warehouse application.

[00:07:57] And then there's third parties, like the delivery service and so forth that eventually all have to play together. And as people gained more insights into the individual applications, they started to realize there are still gaps in between those. So I really want to get like a full picture. And that's why we see customers now driving us more and more to, can you link this together, but also provide this information at a different abstraction layer.

[00:08:21] Like in observability, we talk a lot about traces and getting into all the details or what we announce around live debugging, even seeing the individual parameter of service executions or function calls. At a business level, I care about way higher level building blocks. So did the order work? Did the shipment work? Did the pickup work? Whatever it is. And also speaking the language of the business.

[00:08:49] And I think this is where the business aspect eventually comes in there. And I remember very early in the APM days before observability, we used to say it doesn't help if your infrastructure is up and running. You need to know whether your applications are actually up and running. And here is just like the next level. It doesn't help if all of the individual applications are up and running, but your customers do not get the experience end-to-end that they want to have.

[00:09:15] And this obviously goes way beyond the applications you're running and even the parts of the process that are simply in your control. And there is so much noise around AI at the moment, and it has been for a couple of years. And the observability space is relatively emerging and evolving, and there's no clear leader in that space. So how do you think Dynatrace's approach differs from other solutions out there or competitors in the market? What makes you different?

[00:09:41] Dynatrace, by the approach as I described before, that we decided to build AI into the platform, which is what you will today call an AI native application, is really having AI at its core. What you see a lot of others doing there, just sprinkling some AI on top. So it's enhancing some functionality with AI or making it easier to access.

[00:10:03] And in many cases, it's like, okay, you have a chatbot that is somehow integrated or, again, some query creation support. But it's not like really built as a core part into the platform.

[00:10:17] Like what we did was what we refer to as the hypermodal AI, the active combination of predictive to find issues and also predict future behavior, together with the causal engine that can do impact and root cause analysis with generative AI to make active proposals. But it doesn't actually start there. People always talk about, okay, this is the AI, but the AI is nothing without its data. Well, what we also did, we built what we call SmartScape, which is actually two things.

[00:10:46] It's a semantic model of how an IT system works. But it's just more in the fields. It also has knowledge built into it for things that are like very obvious to us. If a service is running on a machine, that machine is running low on CPU and the service is slowing down, these two things are related to each other. So having this semantic model on top that can be used in real time is also key. There are plus, underneath there's Grail, our data storage layer.

[00:11:15] It can ask basically any question because it's indexless storage. And combining all of this together, I think is what really makes a difference and what really enables us to do this very enhanced use case. And especially when we then talk observability, kind of like jumping out of just observing, as the name applies, to actually act. The ability to actually seamlessly integrate this with automation engine to actually take action is what really kind of makes a complete picture.

[00:11:46] And the one thing about Dynatree is that this is not only the case for the applications that are the apps inside the platform that we are providing, but this is also available to all of our customers. So they can build on top of these functionalities, their own remediation workflows, their own apps and their own ways on how they want to automate and use this. That was one of the major changes we made to the AI engine that we had in the past.

[00:12:12] Today, in this new iteration, we're actually exposing individual AI capabilities so that you as a customer can use the predictive functionality, for example, and build a workflow that allows you to do things like preventive operations. Incredibly cool. And I love what you said about some companies having that sprinkle of AI on the top. And I think we've all seen examples of that.

[00:12:32] And just to further bring to life everything we're talking about here for business leaders, are you able to share specific examples around how Dynatraces, AI drives automated problem detection, resolution and optimization at scale? Because that's the real cool stuff, isn't it? Yeah, eventually you have to get back business value. That's why it's been in observability. I mean, it is core to understand where things are going wrong, but eventually you want to look at the ROI of this.

[00:12:59] So I think the key is, first of all, you need to figure out, is everything working fine? What predictive AI really does and where it helps you is doing this across the board in a highly opinionated, optimized way, meaning looking at anomalies in the system. Okay, this is working. This is not working. That already helps. You don't have to take care of any of this.

[00:13:29] Then the majority of times when things go wrong, it's really spent on resolving those problems. The key business value is, in the past, you had like a lot of people who needed to come to a room, worst case everybody, or at least one person from every department. And then you had to figure out whose problem it was. And if it's just one service and you run 1,000 microservices, there would be a lot of people involved in this process who shouldn't even have to, shouldn't be there and who also can't contribute to that solution.

[00:13:57] I think that's one key area there where we're helping them. And then when we talk about resolution, you would want to do two things. When we usually talk to leaders in the SRE and IT operations space, they mention they have a lot of tasks that they have to do right now manually. And they don't know how to automate them. It gets to like regular maintenance tasks that are very much event-based. It's what we started to do with DevOps automation.

[00:14:28] If something happens, you react to it. However, it only gets you so far because you might not even have an event. At some point, you need to realize, is the situation okay or is it critical? And their predictive UI, for example, can help you. One example there is customers are using us to automatically resize, scale up, scale down their environments. You look, okay, what's the current behavior? When do I have to take an action?

[00:14:53] And this suddenly becomes dynamic that the when point, there's no dedicated event, but you have to figure out based on data. When most likely something is going to happen and how quickly it's going to happen. And it's also not exactly what that action is going to be. So predictive UI can also help you to understand this. And then you take this, automate this into a workflow that you automatically resize your disks, resize horizontally or vertically scale your environments.

[00:15:22] The value, obviously, is that you can do this much more real-time, not affecting users. But the additional value is what would you do if you have to do it manually? So assume it's Friday afternoon. You think, well, better take care that the system doesn't have any issues over the weekend. You will most likely approach it with a better safe than sorry type of approach.

[00:15:44] Meaning, yeah, you scale it up much higher and we've seen just by using predictive operations that the over-commitment that people are using on their infrastructure just to be safe can go down somewhere in the range of 30 to 50%. I mean, nobody wants to go have their IT systems go down on the weekend when you're out at the beach playing with your kids. So you rather work there.

[00:16:12] And then the next level is really proactively finding areas where you can optimize. And another side effect is as you do this now with software, you're freeing up a lot of the people who had to do this before. So it's obviously we all want to work more efficiently. We want to use our workforce more efficiently. Number one, because obviously it's a cost issue. Number two, it's very hard to find highly qualified people.

[00:16:40] And it's not very smart to let highly qualified people do very mundane tasks. That really allows you to invest in what really brings you forward, what generates value for your company. That might be that Gen.EI project that you're working on, that new mobile application that you're investing in, or whatever you think is going to be the next big thing for your company.

[00:17:02] But by doing all of those other things, it's this indirect ROI that suddenly you have way more people, way more talent available to also work on this. Yeah, I'm completely with you on that. So when we're looking at observability, it is relatively new. It is continuously evolving. It has become almost impossible to predict the future. But where do you see observability going over the next, what, three to five years? I know that's an impossible question to ask, but how do you see it naturally evolving? I think it's not that impossible to predict.

[00:17:31] Listening to our customers, we have seen this shift really last year. Before that, everybody was scared that the observability solution would start to take action. Whether it's just proposing to take action, but actually taking action. But starting last year, we have seen in our customer base that this has become a strategic agenda of companies. Coming in different names. Zero incident policy.

[00:18:00] 100% IT automation. I guess that these more lighthouse goals that people have. And that's what you hear people now saying more and more. Yes. Now that we can identify a problem, isolate the root cause, and have access to a knowledge base that can take action via Gen.AI. We want to get to a point where we do not have to really interfere or like have to manually interfere with the system to get it back into a healthy state.

[00:18:28] I think that's where we're going on in the longer run. Like a full automation might never really happen. I think it's getting us closer and closer and closer there. And that's where I see the overall direction. What I also see is that domains start to blend now more and more. We talked obviously about business observability and understand the business process with what's happening on the application layer.

[00:18:56] Also like security and observability are converging more and more. And security data in combination with observability data is way more valuable. And as soon as we start thinking about, for example, cloud posture management or posture management in Kubernetes, the real impact of the CV depending on how your application is structured and so forth. We see these things coming way closer together. So people would rather centralize it.

[00:19:27] Because I think people start to realize that having five single sources of truth is not where they want to be. And it also starts to become a storage problem because why are you storing the same data across more and more systems? So I think we are closing in on having like one central storage layer and building on top of the storage layer. To be fair, that has been tried a lot of times in the past.

[00:19:53] But with new storage technologies, like what we are doing with Grail that can work with like really large amounts of data that don't require any like re-indexing that are basically acting in an index less way and can run highly parallel queries across terabytes of data in milliseconds. I think there's way more possibilities to do this.

[00:20:18] Because very often this failed even with the definition of a data model, which you suddenly no longer need because in combination of not having an index, you don't need a schema. That was if you look back to observability, one of the key drivers for observability was moving towards high dimensional data. Which is to some extent another way of saying kind of like almost schema less data, not entirely, but going in that direction.

[00:20:45] And the more we move there, we're starting to enable those use cases, having that convergence and like really working on like a central thought of truth that is kind of combined with other sources on demand, obviously, for more detail. And then I think it's really about using the data and making it actionable. And I'm curious because it feels that there's a lot of buzz around observability. I mean, a lot of excitement, a lot of opportunities there. And a lot of your competitors might be thinking of rushing into that space too.

[00:21:15] So how do you at Dynatrace plan to stay ahead in this rapidly evolving AI driven ecosystem that we're talking about? I think that when you always stay ahead is, as we like to say, you listen to your customers, but you do not necessarily do what they want you to do. Even here this week, I had a lot of conversations with customers, what the challenges are, what their problems are. And that's always a next thing.

[00:21:42] That's like whenever you solve one problem, you're climbing up the ladder to the next. And that's where you have to stay like super customer focused. What type of problems do they want to solve? And then you really work on, okay, what is the really the best solution? Because people, especially if people using your product often think of ways how they can solve it with the product that you have today. Like, how can I combine or use what I'm using today?

[00:22:09] What you do as a product person is to some extent, yes, on the short term, that's what you're trying to do. But on the midterm, you want to build a new innovative solution that might be much better than this. And on top of this, what you're asking yourself at the strategic level, and this is what we have been doing at Dynatrace a couple of times. That's now also with like the third incarnation of the Dynatrace product.

[00:22:36] You ask yourself the question, which product would I need to build that is so significantly better or more advanced than everything that's out there today? That even we would not know how to compete with a product like this. So you have to challenge yourself. There's really this wee time horizon that has this different thinking. How can I solve it for the customer tomorrow with what I have in the product right now? But actually having in your mind what would be like a really innovative, better solution to this.

[00:23:06] And then you think about what's going to be the disruptive solution and really living on this three time horizons. And at every tech conference, they always have a big theme that dominates conversations on the show floor away from the keynotes. You say you've been talking to a lot of clients as well on and off stage. So from everything that you've heard this week, is there a big topic that's exciting attendees at Dynatrace perform? I would say it's two topics. One is clearly AI. And one is using AI for observability.

[00:23:36] And the other one is observability for AI based applications. The other topic is around developer observability, like really driving down from operations seamlessly into developers with some announcement we made, for example, around the ability to live debug your applications. That's definitely key. That's definitely key. Or I would even say what we did in the security space around cloud posture management.

[00:24:01] You could say this is production, but it was very much development because who is eventually writing the infrastructure as code scripts and like building a more integrated solution. Because in the past, even we always had to choose with that more traditional AI approach. Which I'd give this screen on the left. You have to do that menu bar and then you have all service infrastructure cloud, blah, blah, blah. So you always had to pick your audience. And you couldn't pick two. What's relevant to one is not relevant to the other.

[00:24:31] But with this new app concept, we can start to branch out into different audiences, build something specifically for a developer that looks like very fine-grained data, like really finding the needle in the haystack. And then the more operational, like overlooking your entire software estate and finding, okay, is it actually hay or is it grass? Is it burning? Is it fine? So that's the two things really.

[00:24:56] Everything we saw around AI on the one hand and everything we saw, okay, getting developers in there and having this seamless transition from an operational SRE mindset to a developer and giving them the same tools that they need, but obviously more tailored to their specific needs. And this week I've seen you rushing around site, back-to-back interviews. I've seen you on stage. When all the adrenaline begins to drop a little bit, you're on that plane ride home and you reflect on everything you've seen and heard this week.

[00:25:26] What are you going to be thinking? What will I be thinking? Well, it's actually getting back to work then on Monday. I think it was successful. And I think it continues this shift that I have started to see four years ago and every year perform with a new validation that we are more and more moving away from talking about features, talking into use cases and talking even about organizational change. And we see this happen more and more.

[00:25:56] And I think it's also a good sign how a space and how technology matures. If it's not, we have feature X, we have feature Y. Feature X got slightly better, but you start to talk about new use cases, how you enable new approaches of how people work together and like really being on this trajectory.

[00:26:18] And the other one is, okay, also thinking as a product person, how much we see really acceleration in technology. It feels like the adoption speed and the adoption appetite for new technologies is constantly increasing. And that's obviously also challenging us as a company. How can we keep up and building faster and faster and faster for in a space that's evolving very quickly?

[00:26:48] Like what we're talking right now, for example, in the eye space has changed a couple of weeks ago. And that's the innovation speed that we're on. And the key for us is how can we make this easier, safer for customers and give them more confidence in as they are building it. So I think that will keep me going, but still seeing, okay, we're talking on a transformational level and no longer at the tool level. So much for everybody listening to think about here as well.

[00:27:17] We've covered a lot in a 35-minute podcast, but for anyone listening who wants to dig a little bit deeper on anything we talked about today, whether it would be the announcements, whether it would be your insights, where would you like to point everyone listening? Yeah. I mean, I would go to the Dynatrace website if you're really interested in everything we're doing. We have this great thing we call the Dynatrace Playground. It's real-world use cases. Just looking at how observability can help you to solve those problems.

[00:27:43] It's more of these real-world use cases, which we have taken to like an interactive product type of demo. Also, obviously exploring the videos. Once they're available, I mean, perform, physically perform is over, but you can still sign on online and get access to it. Awesome. Well, I'll have links up to absolutely everything so people listening can find all that stuff nice and easy. But we're reaching the end of Dynatrace Perform. You can head to that airport very soon. But more than anything, just thank you for taking the time to sit down with me today.

[00:28:13] Thank you for having me. So with AI becoming deeply embedded in business operations, I think the shift from monitoring to intelligent observability is way more than just a trend. It's becoming a necessity. Whether that be from predictive insights to autonomous IT operations, Dynatrace is pushing the boundaries of what's possible. But the bigger question is, as AI evolves, so will the challenges. So what is next for AI-driven observability?

[00:28:42] Will businesses fully embrace autonomous IT management, which does feel like a pretty big move for a department that has always been incredibly cautious? Or will the human factor always play a critical role in decision making? I suspect the truth is somewhere in the middle there, but let me know your thoughts. Email me now, techblogwriteroutlook.com. Instagram, x, just at neilchews. We'll keep this conversation going. But it's time for me to hop on the plane back to the UK now.

[00:29:12] Big thank you to everyone at Dynatrace for looking after me this week. And hopefully you will join me again tomorrow where I will be back in the UK to record this show. But that's it for today. Speak with you all tomorrow. Bye for now. Bye for now. Bye for now.