3172: Unlocking the Power of Unstructured Data: AI, Storage, and the Future of Data Management
Tech Talks DailyFebruary 06, 2025
3172
25:0320.06 MB

3172: Unlocking the Power of Unstructured Data: AI, Storage, and the Future of Data Management

Unstructured data is growing at an unprecedented rate, now making up 90% of enterprise data—but how do businesses harness its full potential?

In this episode, I sit down with Krishna Subramanian, co-founder and COO of Komprise, to explore how companies can optimize, manage, and extract value from their unstructured data across hybrid and multi-cloud environments.

Krishna shares insights from over a decade in the industry, explaining how data has outgrown traditional storage solutions and why independent data management is critical for organizations to avoid vendor lock-in. We discuss how AI is revolutionizing data strategies, from optimizing storage costs to preparing unstructured data for AI-driven workflows—a challenge every enterprise is now facing.

We also dive into the hidden costs of cold data, how Komprise's Global File Index brings structure to unstructured data, and the role of transparent data tiering in reducing storage expenses while maintaining easy access. Krishna highlights why data governance and security are more important than ever, especially as enterprises explore AI adoption and face increasing ransomware threats.

As AI becomes more deeply embedded in business processes, how can organizations ensure their data is AI-ready? What steps should they take today to future-proof their data strategies? Join the conversation and discover why unstructured data management is the next big frontier in enterprise IT.

[00:00:03] Welcome back to the Tech Talks Daily Podcast, where every day we explore the latest innovations shaping the future of technology. And my guest today is the co-founder and COO of Comprise, and they're a company at the forefront of unstructured data management.

[00:00:21] We first met at the IT Press Tour in Silicon Valley, and after having a fascinating discussion, I knew I had to get her onto the podcast today, to not only share her insights on the evolving role of unstructured data in enterprise IT, but the value that it's helping to unlock. Because unstructured data, which is essentially everything from videos to documents to x-rays and just about every file you can think of, they make up 90% of enterprise data.

[00:00:51] And yet, organisations are still struggling with how to manage, optimise and extract value from that data. So, here in 2025, as AI adoption accelerates, businesses are beginning to realise they need effective data governance. Especially if they're serious about ensuring their corporate data fuels AI-driven insights, without creating security risks and spiralling costs. And I suspect a few of you listening have seen just that.

[00:01:21] So, today we're going to talk about how unstructured data management has evolved over the past decade, why independent storage agnostic solutions are essential in a multi-cloud world, and how data governance and AI-driven insights are shaping the future of enterprise storage. So, if your company is drowning in unstructured data and wondering how to make it work for you, rather than against you, this is a conversation you are not going to want to miss. But enough rambling and scene-setting from me.

[00:01:51] Let's get today's guest on now. So, a massive warm welcome to the show. Can you tell everyone listening a little about who you are and what you do? Thank you for having me on, Neil. I'm Krishna Subramanian. I'm a co-founder and COO of Comprise. We are a data management company that focuses on unstructured data management. And there is so much I'm looking forward to speaking with you about. We met at the E-Press tour in Silicon Valley.

[00:02:20] There is an increasing focus on unstructured data management. But I'm curious, if you look back at your career, how has enterprise unstructured data management, how has it evolved over the last 10 years? And what role has Comprise played in addressing some of these challenges? Because there's a lot of hype around it right now, but this is something that's been very close to your heart and you're passionate about for many years, right? Yeah, it's a great question, Neil. Well, and it kind of goes back to why we started Comprise.

[00:02:48] Because we started Comprise about 10 years ago. And myself and my two co-founders, this is our third company together. Our prior two companies were also had something to do with data. And both those companies got acquired. And after our last company got acquired, we were talking to a lot of our customers from our last two companies. And they told us that they're starting to drown in unstructured data.

[00:03:15] Basically, unstructured data, it's data that doesn't sit in a database. It's not spreadsheets. So it's not tabular data. It's all the other stuff. It's stuff like documents, videos, audio files, genomics data, oil and gas data, x-ray images, all that kind of stuff, which people used to just put in a file share and kind of forget about it, really. But they started seeing that that data was growing in double digits.

[00:03:42] And most applications now generate unstructured data. And when we started Comprise, I think about 60% of data was unstructured. And today, 90% of data is unstructured. So that's how rapidly unstructured data is growing.

[00:03:59] And so that's why we created Comprise, to really have a layer separate from storage that can look at unstructured data across all your storage and cloud silos to give you a view of how much unstructured data you have. To optimize its costs, because 30% of IT budgets today is spent on storage and backup. And more and more, that budget is being spent on unstructured data.

[00:04:27] And 80% of that unstructured data is cold. So you can really reduce the cost if you target data management. But then, like AI now, all runs on unstructured data, especially corporate data. That's going to be the next big thing. How do you use corporate data with AI? So we feel like we're in the right market at the right time. We started this company based on customer feedback about why unstructured data is becoming a big problem.

[00:04:54] And fortunately for us, everything our customers told us before has turned out to be true. And now AI is the next big thing for unstructured data. Yeah, that's one of the reasons I felt it was so important to talk about how it has evolved. Because right here, right now in 2025, AI is relying heavily on unstructured data. And I'm curious, what are the biggest challenges that enterprises are facing when they're beginning to prepare their data for this AI-driven workflows?

[00:05:24] Or a multitude of AI-driven workflows in their business? An interesting thing is that I think as this AI market is sort of taking us by storm, and a lot of us are starting to realize the potential of generative AI in particular, I think there's a lot of confusion as well in the market. A lot of people thought, okay, AI needs all these GPOs and performance, and it's going to be all about like higher performance. And maybe storage needs to also be GPO-enabled and perform well.

[00:05:52] And not that it doesn't. Yes, that is important. But I think the point that things like DeepSeek have really shown us is that model training is only about 5% of the market. 95% of the market is using a trained model. And the cost of model training will keep dropping over time. So yes, performance is important. But even more important than performance in the long run is how your corporate data connects to AI.

[00:06:21] Because if you think about it, AI has already been trained on all the data that's out in the public domain. So what the competitive differentiation that any enterprise can have in using AI is to tie it to their corporate data and to get more intelligence from their own data. So the tying of corporate data to AI is the next big frontier. And 90% of that data is unstructured.

[00:06:47] So tying unstructured data to AI with proper data governance and security is like the key thing for a business. And that's the problem we are focused on. Yeah, I completely agree with the phrase you used there as well, how it is the next big frontier. Yeah, I'm curious from your viewpoint here, why is an independent unstructured data management? Why is that so critical in today's multi-cloud and hybrid environments that every enterprise finds themselves in?

[00:07:17] And how do you compromise? How do you differentiate yourself from so many different vendor lock solutions out there? If you think about it, when we started the company, a lot of our customers said, our storage products do provide some data management. And that is true. Even today, your storage platform has some ways to reduce storage costs and to improve efficiency or to get visibility into your data.

[00:07:42] But the real challenge for businesses is that 97% of businesses use more than one storage architecture and they use more than one storage vendor. I mean, almost all businesses today are hybrid cloud. So already you have at least two vendors. You have your on-premise storage vendor and you have a cloud vendor. And most of them also grow through acquisitions between, say, have other platforms they get from other people.

[00:08:08] And so it's just that it's a multi-vendor, multi-architecture world. And you need a way to look across all of those silos. You can't manage data in each silo because data outlives any storage. The average lifespan of data in an enterprise is 20 years. Most people don't delete data earlier than that. And the average lifespan of a storage solution is three to five years.

[00:08:37] So you're probably going to go through three to four storage refreshes before you delete your data. So that means your data needs, you need to have visibility longer than a storage architecture. That's why having a storage independent solution that looks at data across all your environments, right places, the right data in the right place at the right time. And lets you move from one solution to another without having to rehydrate all the data.

[00:09:06] Because that's the other problem when you use storage-based data management. It's proprietary to that storage. So when you want to switch to another storage, you have to, ironically, you have to buy more of their storage and rehydrate all the data to move off of them. That's the vendor lock-in. And with Comprise, you don't have to do that. We work through open standards. We keep data native wherever we move it, meaning you don't even need Comprise to get your data. You can directly go and get your data from anywhere.

[00:09:36] It's always in the customer's control. And we provide transparency across these architectures. That's kind of where our patents are. So you can still get the benefit of what looks like a seamless architecture, but you can have choice and you're not locked in. And Comprise has also introduced several new capabilities for sensitive data management that I was reading about before we came on the podcast together today.

[00:10:03] So what specific problems do these features solve for any IT storage and security teams listening? Yeah. So, Neil, so Comprise already, just to give some background on what we do, you can just point us at all your data centers and your clouds and Comprise will find all the storage in those environments. It will show you analytics on your data. It will show you how much data you have, how much of that is cold, what kind of data it is, who's using it, all that kind of valuable information.

[00:10:31] But all of that is based on the metadata. It's based on the data that's already there in maybe in storage systems. It's just distributed all over the place and we are indexing it on one place. But a lot of times you want to know more than that. You may want to know, you know, do any of these files have sensitive data in them? Is sensitive data sitting in places where it shouldn't be sitting because it's a liability?

[00:10:57] Especially for AI, when you're thinking, let me find some files and maybe augment a prompt with corporate data, you want to make sure you're not sharing sensitive corporate data mistakenly with an AI workflow.

[00:11:12] So for these reasons, we just launched our sensitive data detection capabilities where Comprise itself will not only look at metadata, but will look at the file contents and will be able to tag which files have sensitive data and what kind of sensitive data. And the reason that's useful is because you can then either move sensitive data out of places where it shouldn't be because we then allow you to take action on it.

[00:11:41] Or using our smart data workflows, you can create an AI ingestion workflow and exclude sensitive data from it. So you can use that knowledge in different ways. And just to highlight the significance of what we're talking about here, can you tell me a bit more about the importance of Comprise's global file index and transparent move technology and how these innovations ultimately will enable enterprises to manage their data more efficiently?

[00:12:10] Because that's the big value add here, isn't it? Yeah, exactly. So when you point Comprise at all your storage, Comprise actually not only gives you analytics in the aggregate at how much data you have, how much is cold and things like that. We are actually indexing all the metadata into what we call a global file index. So we're basically building structure around unstructured data.

[00:12:35] And the reason that global file index is important is because it sort of becomes your meta database. It becomes a way where you can search across billions of files and find just the files of Neil's podcast recordings that had to do with AI. You could just search and just find those. And then you could say, OK, now let me have a generative AI application, maybe summarize those and create a nice article from it so that it can then link to all those podcasts.

[00:13:05] And I could use that for SEO, for marketing. So these kinds of applications, if you want to do with your unstructured data, you need a way to search across all your data assets. You need a way to index them. You need a way to enrich metadata. So maybe after that AI is finished, we could be adding tabs saying, OK, this podcast actually talked about generative AI and it talked about data governance.

[00:13:33] But this other podcast talked about machine learning and it talked about this other thing. So we could be adding those tabs in our global file index. So we're continually enriching them around the data. So Comprise is providing a way to bring structure to unstructured data.

[00:13:51] And we're also providing a way to move data to the right place at the right time because we have an engine that migrates data efficiently and also transparently moves data across architectures. So we can take something from file storage, put it in object storage like in the cloud, but make it look like it's still a file in the file storage. So the users don't even know that the data is sitting somewhere else. It's transparent to them.

[00:14:19] That's what our transparent move technology does. So we're trying to make it frictionless to manage data at scale. Wow, that is just incredible. And there's someone with 3,100 podcast interviews. Man, I've got a whole series of books I could be releasing right now. So I'm going to be checking that out. There you go. Hit them in sight. But many organizations right now, they're looking to optimize costs through intelligent data tiering.

[00:14:44] So how does Comprise's approach to storage as a service and departmental collaboration impact some of those enterprise IT strategies that we're hearing more and more about? Yeah, so a lot of people, as you said, their central IT teams are delegating more responsibility to a departmental and they're offering storage more like a service. And one of the challenges with that is in a lot of organizations, they don't actually do chargeback.

[00:15:11] They don't actually tell people, hey, you have to pay for that storage. So people think it's free and they just keep storing more and more data. But the company has finite budgets, right? So how do you get your departments to know how much data they have, how much it's costing the business? How do you bring more visibility and reporting to it? And then what do you do? Because, I mean, just like you're on your phone, you probably have tons of photos you took a few months ago that you've never looked at.

[00:15:40] But you don't want to delete them either because you may got a grandma's in a few weeks and you may want to show her one of those pictures, right? So just like we hoard data as individuals, businesses also hoard data. And 80% of the data typically hasn't been touched in three months or longer. And yet it's consuming expensive resources. So Comprise shows by department how much of this data is cold. It lets you do showback to people.

[00:16:07] And then it lets departmental IT say, okay, this stuff, these projects, it's fine. You can tear that data off. It's all right. Or these things that you're showing as cold data, it's okay to reduce that cost. And then based on those policies, Comprise transparently tears that data, again, without lock-in. Because if you used a storage-based tiering, you would get locked into that architecture. But the way we do it, it's without lock-in.

[00:16:33] And then you can actually do reports to management and say, look, engineering department actually helped us shave off $4 million of costs because they did this kind of data management. And so that's how a lot of businesses are using us. In fact, one of the largest cancer teaching hospitals in the United States has been successfully using us to do exactly this.

[00:16:58] And even their researchers, their cancer research scientists are able to tag data using our product. And then the data is getting cured efficiently. And they're finding other value by tagging the data because they can also tag it based on the project they're working on or the grant that they have around it and what the subject is. So they're able to enrich the metadata for other applications as well, not just storage efficiency. I love that.

[00:17:27] And another big topic that we must highlight is ransomware resilience because that's also becoming a big priority for businesses right now. So how does your data management platform help organizations shrink their attack surface too and ultimately enhance that cyber resilience or improve their cyber hygiene too? Yeah, that's a great question, Neil. And I think a lot of people think data protection is something they need to do for their mission critical data.

[00:17:57] Everybody is painfully aware of the need to back up your mission critical data or provide ransomware defense on the mission critical data. But the problem is something like ransomware will attack the weakest link. So all that cold data that's sitting there that nobody's paying attention to, it's quite easy for some ransomware attack to go and infect one of those and just lurk for months and then suddenly manifest. Right?

[00:18:24] So you really want to, you know, the data that's cold, nobody's using, you kind of want to take it out of the attack surface because nobody's using this data. Why make yourself vulnerable? And so when Comprise tears that data out, first of all, that data is no longer sitting on your active storage. So it's not exposed to the attacks the same way as data sitting on that storage is.

[00:18:50] But secondly, even if somebody were to follow a link and get to that data in sitting in the cloud, what we do is we can put that data on immutable storage, meaning that even if something were to infect it, it would not really infect it, it would create a new copy of it. So you would be able to restore the older copy.

[00:19:11] So we can not only reduce the ransomware attack surface by 70% or more and shrink the backup costs by 70% or more because we have taken megabytes of file data and replaced it with kilobytes of links. So we've shrunk the attack surface and the backup costs. But we're actually giving greater protection on the data by putting it on immutable storage.

[00:19:38] And we have customers, like one of our customers is this multinational law firm, and they use Comprise to do exactly this. They use us to put data into the Azure immutable blob to reduce their ransomware attack surface. I think they've saved millions of dollars every year by doing this, and they now have better ransomware protection than before.

[00:20:04] If we were to look ahead, how do you see unstructured data management evolving? And anything or any tips you could advise on what enterprises should be doing today to future-proof their data strategies? Because things are moving pretty quickly right now, aren't they? They really are. I mean, this may sound counterintuitive to you, but I feel like if I were to give tips to anyone, it would be to start with visibility.

[00:20:31] Because as basic as it sounds, you'd be surprised how many customers I talk to, and they know roughly how much data they have, but they really don't know who's actually using it or how fast it's supposed to grow or how fast is it going to grow next year or what's going to happen to its cost profile.

[00:20:56] And so I think that they're going to be able to do that. Keeping petabytes of data around if you're not going to get any value out of it.

[00:21:25] So there's so much untapped value in unstructured data. It's time to really start exploiting that value, especially with AI. So preparing your data for AI should be top of mind for every single infrastructure professional out there. A lot of times I think storage people don't think they have any role to play in AI, and that's not true.

[00:21:49] Data is the lifeblood for AI, and data governance for AI is absolutely a responsibility of that storage slash infrastructure person. Their role is changing. They're going to become a data steward. So they need to start thinking that way. So much food for thought in our conversation today. So thank you so much for taking the time to sit down and share your invaluable insights with everybody listening.

[00:22:14] But anybody that would like to take it a step further, maybe they want to find out more information about Comprise, or maybe they want to contact you or your team. Where would you like to point everyone listening? Absolutely, Neil. You can go to Comprise.com. That's the easiest. We have a lot of assets there that people can look at. And if they want to try it, if they want to say, hey, how would this look with my own data? We'd be happy to help them with that so they can just fill out a form on our website.

[00:22:42] So I would say go to our website and our LinkedIn company page. We post a lot of updates there as well. Awesome. Well, I'll add links to absolutely everything to make it nice and easy for people to find you. Thank you so much for hosting me at the IT Press Tour in Silicon Valley at your office. It's been lovely to meet you live in person. But more than anything, just thank you for sitting down with me today. Thank you so much, Neil. I really enjoyed it.

[00:23:07] I think we've covered just about everything you need to know about the challenges and the opportunities of unstructured data management. But one key takeaway really stood out to me from our conversation. AI is only as good as the data that it learns from. And if enterprises want to leverage AI effectively, they must get serious about data visibility, governance and cost optimization.

[00:23:31] And as Krishna pointed out, most companies are still making critical decisions in the dark when it comes to unstructured data. But Comprise's global file index, transparent data movement and AI-ready governance tools are indeed helping enterprises reduce costs, mitigate security risks and unlock new business opportunities. All without being locked into a single vendor. And again, massive point there.

[00:23:59] So if today's conversation resonated with you and you'd like to learn more, head over to Comprise.com or check out their LinkedIn page for more insights. But again, if you're just curious about how their approach could work for you and your business, they're offering tons of demos and trials. So don't hesitate to reach out and check that out too. But a big thank you to Krishna and the whole team at Comprise for hosting me in Silicon Valley during the IT press tour and also joining me on the podcast today.

[00:24:29] But over to you. What do you think? Is your company ready for the AI-driven data revolution? Let's keep this conversation going. Email me techblogwriteroutlook.com, LinkedIn, Instagram, extra stack, Neil C. Hughes. I'll keep a lookout for your emails and messages coming in. And I'll also prepare a guest for your listening pleasure bright and early tomorrow. Hopefully you'll join me again then. Bye for now. Bye for now.

[00:24:56] Bye for now.