3164: Breaking Data Silos: How Hammerspace is Powering AI Storage and Hybrid Cloud
Tech Talks DailyJanuary 29, 2025
3164
24:2619.56 MB

3164: Breaking Data Silos: How Hammerspace is Powering AI Storage and Hybrid Cloud

As part of the IT Press Tour in Silicon Valley, I had the opportunity to sit down with David Flynn, CEO of Hammerspace, to explore how the company is redefining the future of enterprise data storage.

At a time when AI-driven workloads and hybrid cloud computing are pushing storage demands to new heights, Hammerspace is stepping in with a fresh approach. Their Global Data Platform is designed to eliminate data silos, providing seamless access and orchestration for unstructured data across edge locations, data centers, and cloud environments.

Hammerspace is experiencing a breakout year, with tenfold revenue growth driven by the surging demand for AI storage solutions and hybrid cloud infrastructure. The conversation delves into how the company is enabling enterprises to move beyond traditional storage limitations.

David explains the critical role of Parallel Network File System, or pNFS, in delivering high-performance storage solutions. We also discuss why it's gaining traction as organizations seek ways to process massive amounts of data with greater efficiency. Together, we explore the state of pNFS adoption, how Hammerspace is driving its evolution, and why it is a key pillar of its platform strategy.

Another important aspect of the discussion is the difference between global namespaces and true data orchestration. While global namespaces have gained attention in enterprise storage, David explains why simply having a global namespace is not enough.

Orchestration is the missing piece that allows data to be moved, accessed, and utilized efficiently across distributed environments. By combining orchestration with a truly global namespace, Hammerspace is making data instantly accessible to users, applications, and compute clusters, no matter where they are located.

As AI workloads become increasingly complex, the need for a new approach to data management is clear. Enterprises require storage solutions that can keep up with the demands of high-speed analytics, GPU-driven AI, and hybrid cloud strategies.

Hammerspace is positioning itself as a leader in this space, offering a way to bridge the gap between traditional storage architectures and the needs of modern enterprises.

The company's rapid growth is also reflected in its recent leadership expansion. With a new Chief Revenue Officer joining the team and Hammerspace earning recognition as Product of the Year in TechTarget's Storage Awards, the company is making significant strides in shaping the future of enterprise data storage.

With so much change happening in data infrastructure, what does the future hold for AI storage and enterprise data management? Will solutions like Hammerspace become the new industry standard for high-performance storage and orchestration?

I'd love to hear your thoughts on this. Reach out and share your perspective, and stay tuned for more insights from the front lines of technology.

[00:00:03] Welcome back to the Tech Talks Daily Podcast, where this week I find myself in San Francisco as part of the IT Press Tour. And today I've sat down with David Flynn, CEO of a company called Hammerspace. And together we're going to explore how they're redefining the future of data storage, AI-driven workloads and hybrid cloud computing. And it's looking to be a breakout year for Hammerspace.

[00:00:33] They've recently announced 10x revenue growth, fueled by the surging demand of AI storage solutions and hybrid cloud infrastructure. And their global data platform is transforming the way that enterprises access, manage and orchestrate unstructured data across edge, data centers and clouds.

[00:00:56] All while eliminating data silos and maximizing performance for AI, GPUs and high-speed analytics. And I also want to unpack the evolution of Parallel Network File System or PNFS. Something that's increasingly being seen as game-changing technology for high-performance storage.

[00:01:18] And I also want to learn how Hammerspace is pioneering data orchestration and global namespaces to meet the demands of modern AI and enterprise workloads. This feels like a pivotal moment for data-driven businesses. And Hammerspace appears to be at the forefront of solving the industry's toughest challenges. So what's next for AI storage and the future of enterprise data management?

[00:01:47] Well, let's dive into this topic with David Flynn, CEO of Hammerspace right now. So thank you for joining me on the podcast here at the IT Press Store. Can you tell everyone listening a little about who you are and what you do? David Flynn is the name. I guess biggest claim to fame was I founded the company Fusion IO that introduced the modern high-performance SSD and what became NVMe. For that, I was doing high-performance computing.

[00:02:13] I built some of the first InfiniBand-based super clusters back when InfiniBand was a new thing. And yeah, and since then, I've been here at founded Fusion IO solving the issue of data access. And here we are now, fast forward to present day. Hammerspace is helping to transform how unstructured data is accessed and managed.

[00:02:39] But for people listening, hearing about Hammerspace for the first time, can you give me a bit of an overview of what the Hammerspace global data platform is? And some of the key problems that it ultimately solves for enterprises. Yeah, we simultaneously solve several of the long-standing intractable problems in the world of storage and data. But I'll go through a bit of a story if I may. Absolutely.

[00:03:05] So 15, 20 years ago, the industry attempted to bring supercomputer parallel file system architecture into the world of NFS and NAS. That effort failed because it wasn't a big commercial demand. It was still a niche. And the existing NAS vendors had their own file systems that weren't parallel file systems. So it was going to be a retrofit. And it was a design by committee. And you had a half a dozen different operating systems that would have needed this very sophisticated capability.

[00:03:35] Because at that point, Linux had not yet consolidated the landscape. And besides, NFS had kind of stopped being innovated for a long time, right? We had a hard time as an industry getting from NFS 3 to NFS 4, right? So when I left Fusion IO, it was on a mission to say, hey, look, something is going to need to unlock this kind of crazy performance that NVMe can get, but at a file system level. And it needs to be standards-based. So the industry wasn't wrong in their desire to go there.

[00:04:04] It's just poor execution, poor timing. And I figured at some point, the appetite would arrive. And now AI really is that sort of personification of the need for lots of data really fast to feed a bunch of GPUs. And so I set out to make parallel NFS work and do so with the assumption that Linux would consolidate the landscape. So I recruited the kernel maintainer of the NFS stack inside of Linux, Tron Mikkelbust. He's our CTO. We introduced the new standard.

[00:04:34] We built the stuff that's in Linux, laid the groundwork. It's a new highway system. And that allowed us to deliver HPC class parallel file system performance from something that's plug and play and built in to Linux. But this is just step one. That architecture allows us to position data anywhere. Because now with a parallel file system, anything can be the storage node.

[00:05:01] So this allows us to introduce data orchestration, concept of movement of data across systems while still being the same data. Because it's logically in the same namespace, even though it's physically on different storage. Because the namespace is no longer something that the storage provides. It's something that the parallel file system, the control plane that Hammerspace provides. So parallel NFS makes it possible to introduce data orchestration.

[00:05:28] And once you can orchestrate data, now you can present it in multiple data centers simultaneously. So it allows us to solve global file system. And it also allows us to incorporate existing data by reference. So you don't need to do data migrations. So these are, you know, four longstanding intractable problems that are all the fruits of actually delivering on what the industry tried to do 10 plus years ago. I hope that makes sense.

[00:05:57] But the bottom line is, we do for data what VMware did for the compute. We abstract data from the very storage storing it, allowing it to move freely across any of it. Just like VMware abstracted the virtual server from the physical host server, allowing it to move across physical hosts freely. And I say you did a brilliant job there of demystifying and simplifying that. And we will have a lot of business leaders listening as well as techies.

[00:06:24] And parallel network file system or PNFS, it does seem to be gaining attention as an emerging standard for parallel storage. So what is the current state of adoption and what kind of challenges are preventing that wider implementation? Are you seeing anything there or what are you seeing there? Yeah. Well, again, those 10 years ago, the industry largely turned their back on it and shelved it. Yeah. Again, because it wasn't a commercial demand.

[00:06:50] And it's really damn hard to do, especially if you're doing a design by committee from a bunch of companies that have a conflict of interest because they'd have to throw out their existing file systems. But now that we've shown it can be done. And now that AI is driving demand for that kind of performance, you've got Dell with Project Lightning going to try to dust off that stuff. You're right. You've got Vast talking about building parallel file system into their stuff. And then you've got others. I've heard rumors.

[00:07:18] So everybody's going to be following. But what they don't realize is that HPC performance in a standard NFS based file system, that is step one. And then you have to do data orchestration, multi-site global file system, data assimilation.

[00:07:39] And more recently, we announced tier zero, the ability to have data orchestrated actually all the way into the very compute nodes that are using it as one tier out of the many. And so even if the industry gets there with building their own parallel NFS file systems, that's still I mean, that's what we were doing 10 years ago.

[00:08:05] Right. When everybody else stopped their efforts. Now it goes much, much further. And by the way, it turns out this notion of virtualizing the data and allow it to exist anywhere and to be consumable anywhere. That's the bigger thing than the high performance. And people are still stuck on PNFS is a high performance thing. Yeah, I hope that makes sense. It's they're missing the point. It's about virtualizing data from the storage while unlocking massive performance, not just, OK, it's a PNFS enabled file system.

[00:08:33] And I was going to slap PNFS on top of, you know, an existing NAS system and get there. You have to build the orchestration, you have to build the multi-site global namespace, you have to build the data assimilation capabilities, or you won't even have a migration path to get your data in. Yeah. And it feels like there's a real evolution going on here. And you've mentioned some of the things that you're doing. But how would you say Hammerspace is contributing to the evolution of PNFS and why it's such a critical component of your platform strategy right now?

[00:09:03] Well, you know, really what we're seeing is the resurgence of NFS as a thing. PNFS is just its successor. Regular NFS is dying. It's going to die. PNFS is going to pick up from that. By the way, the fundamental innovation difference is it separates metadata and the metadata server from the data path to storage nodes, allowing you to have many different storage nodes in the same logical file system or namespace.

[00:09:30] Regular NFS, you talk to one server and it's both metadata and data combined, and it forces it into a monolith, which doesn't scale. Yeah. Okay. So what we are doing to help here, not only did we introduce the NFS 4.2 spec, but we have introduced the next, I don't know if it's going to be called 4.3 or 4.2 prime, whatever. But it has some really nifty innovations that are going to be transformative. For example, erasure coding.

[00:09:59] So now not only can you have many different storage nodes and parallel access and have data across all of them, but that data can be distributed, erasure coded with all of the benefits of deep erasure coding. So think of it as the capacity efficiency of object storage with the extreme performance of a parallel file system in the HPC world. Wow. Incredibly cool.

[00:10:21] And if we do dare to look even further ahead, what role do you see PNFS playing in the future of data management, particularly in AI driven workloads? We've done well to get, what, 10, 15 minutes into our conversation without mentioning AI. But when we're talking about AI driven workloads with high performance computing, this is... The way I see it is there's, there's fundamentally two important areas where it's transformative for AI workloads and other workloads too, by the way.

[00:10:50] And I think of it as concentric circles. The inner circle is, let's call it the close proximity, where it's feeding the GPUs. That's all about performance, scalable performance. That's the parallel file system part of parallel NFS. The last mile when it comes to the logistics of data. Now, the outer perimeter is what I'd call the long haul logistics. And that's the cross system, cross site movement of data.

[00:11:19] So if the inner one is a redefinition of what it means to be performed in storage, the outer one is redefinition of what it means to manage data. And in particular, it redefines data management, transforming it into data orchestration, which is not just a fancy new name for management. What it means is that the data movement is no longer disruptive to the ongoing access.

[00:11:44] Data can flow from one system to another, from one site to another, while it's being accessed continuously, because it's logically the same data, no matter where it is.

[00:11:56] So I think of those two perimeters, the inner perimeter, the outer perimeter, redefining high performance storage, redefining how data is managed to no longer be copying it between systems to where you're just creating forks in the existence of the data. And there are a lot of discussions around global namespaces in enterprise storage at the moment.

[00:12:23] So what are the key differences between, let's say, a global namespace and true data orchestration? And why is having both necessary for model? Let's be clear. The reason why there's a lot of talk about global namespace is because Hammerspace finally cracked the code and created one. And the other storage vendors are talking about theirs, but frankly, they're going about it the same old way that never worked, creating something that's synchronous. And they're actually trying to promote that as, hey, we're better than Hammerspace because our replication is synchronous.

[00:12:52] It's like, you have a little problem here called the speed of light and physics. And this was solved by the web scale guys by doing eventual consistency and being async. But there are complications. You have to build that into the file system from the beginning.

[00:13:09] It's easy to stretch a file system to span some distance, but stretching it so it can span to the other side of the globe and 16 different sites, keeping that synchronous ain't going to work. I mean, we have customers that are using us at dozen plus sites, Epic Games, Blue Origin, the guys building spaceships, has been video games, literally 12 different sites. That are all the same file system.

[00:13:36] And by the way, the other key is, so number one, it can't be synchronous. You can always add a synchronous layer of locking on top if you want, but that'll slow everything down and so forth. You have to build asynchronous first and then synchronous is layered on top. And we do that with file reservations. You can have your synchronous for the stuff if you want. It's just fundamentally that you have to breach the async barrier and file systems have never done that.

[00:14:02] But because we did and it succeeded, now people are giving credibility to the Johnny come lately's who are claiming they have one. And what we've heard is it's not working for folks because of the synchronous nature. But it is clouding the market a bit, but it makes people go and look. So it's free marketing for us because we will have to look for the one that works. So global namespace is a big deal, you know, for that reason. I will point out the second thing.

[00:14:27] It's not only just that you have to implement something that's truly async, even if you want to put a synchronous locking system on some stuff after the fact. Because if it's synchronous, it cross couples them from a failure perspective. If the site's offline that had the lock, you're screwed. Yeah. Right. So you're, you're, you're, they're all coupled to each other from a reliability perspective. Anyone that's down and they're all screwed or from a performance perspective, because you have to go and talk to all of them to get anything done. That won't work. Has to be asynchronous, eventual consistent.

[00:14:57] Second thing is you need data orchestration because in a world where you're distributing data over large distances, you have to be able to talk about the, the layout of data, the placement and movement of data in a intentional fashion, not just in a reactive fashion. So again, you can't just take a, a regular NAS system and stretch it or else there's no way to talk about the need to pre-position data under what circumstances does it need to be where to set up workflows and so forth.

[00:15:27] And so those are the two key things being intentional about, um, where the data is placed and when it's moved so that it can be there in advance proactively versus reactive. Cause if it's reactive, you're screwed and you can't have everything everywhere. That would be very costly. So we have to have data orchestration and you have to have parallel file system, um, before you can make a true global namespace. And then you have to additionally tackle the issue of eventual consistency and, and true async operation.

[00:15:58] And with AI, GPUs and high speed data analytics, further driving demand for extreme performance as standard. Now, how do you at Hammersmith eliminate things like data silos and ultimately ensure that seamless access across edge data centers and cloud environments? It sounds simple when I say that out loud, but it's incredibly complex.

[00:16:21] That wasn't because if you talk to people who are not like versed in the it world, they might think that this is a solved problem because it's such a self-evident thing. Like, of course you would want that. Like I want to be able to have a file share that I can dump files in here and read them from over there. And I can change them here and I can read them from over there. Right.

[00:16:44] This geographically dispersed, um, data environment, we call it global data environment that, you know, that's never existed because we need a data orchestration. And we needed, uh, a truly async, uh, global namespace or global file system. And you can't build those without a standards based parallel file system and PNFS got shelved and died a decade ago. Yeah.

[00:17:09] So therefore these other things never came to be because the linchpin for making them possible was separation of metadata and the control plane from the data path. So you can have a parallel concurrent data path to, uh, any storage. Now I can make the storage interchangeable and move it because the metadata is separate. The what is separate from where the data is. So now I can change where the data is without it changing what the data is.

[00:17:38] And we're talking here as part of the it press tour in San Francisco, where earlier today you revealed a 10 X revenue growth fueled by our AI storage. And hybrid cloud computing demand. It feels like a real breakout year for you, especially when we combined with hammer space recently announced. I think it was product of the year. It tech target storage awards. We are the most awarded in the, in the world of tech. And it's because this is such a fundamental shift in mentality.

[00:18:07] I mean, if you think about it, um, data today, it exists as a artifact rendered from storage. And what you make it permanent by putting it into storage, by putting it to rest, by killing it in a world where data is orchestrated. It lives in motion and it can move freely across things and it can actually outlive any of the storage. So think about it.

[00:18:32] It's really a bizarre concept that you, that data orchestration makes data more permanent because now it's not bound to the lifetime of any one specific piece of storage and can access it from anywhere. So we've had a breakout year and I'll tell you why, because the claims that we make are so bold for anybody who is indoctrinated in the industry and knows how hard this stuff is to build. Cause I said the novices, they think, why didn't it exist before? Yeah. Right.

[00:18:58] But anybody who's knows the attempts to make global file systems in the past or to make a data virtualization layer or, you know, for, for file storage or whatnot, they know that, you know, all those attempts have failed by the way. Again, cause parallel NFS didn't exist and failed in, in its creation. So, so yeah, we had a breakout year, but it's largely because we got enough proof points with companies that, I mean, you can't ignore meta training llama on it.

[00:19:27] So you're going to say parallel NFS doesn't have the performance you need. I don't think so. It can compete with luster. It can compete with all of those things and actually have better performance. So it took time to, um, to get the proof points into place and then the dominoes start to fall.

[00:19:46] Um, and you know, that's really been the case is a lot of people, you know, have been watching all of our attempt to use parallel NFS and, um, assuming that it would fail just like with the rest of the industry. Um, and that is actually our biggest, um, impediment to more rapid sales is the presumption that because the industry couldn't do it, that it wasn't doable.

[00:20:12] And I would argue that was the same kind of thing that I had when I built the first high performance SSDs at fusion IO. People didn't think that there was a place for them. I said, it's all about timing. Yeah. Right. And now in VME is a half a trillion dollar a year industry. Right.

[00:20:30] And so a standards based parallel file system that enables data to be orchestrated across system and presented globally across different data centers in a global namespace that the time for that, you know, is, is now it's finally arrived. And the technology maturity, you know, thanks to the groundwork being laid with the parallel NFS work in the standards body and in Linux. I mean, even Linux as old as well.

[00:20:56] Seven has everything that's needed already in because we put it there like some seven years ago. And I can't let you go without also discussing another big announcement in the last week. And that is a new chief revenue officer at hammer space. Again, it seems to further highlight you're going through a significant growth phase at the moment. So can you tell me a bit more about that? What's next for the company and how you see all this evolving to meet the demands of AI and enterprise data needs? Feels like there's a big opportunity here. Oh, absolutely.

[00:21:26] We're so excited to have Jeff Giannetti join the team. You know, he is familiar with the problem space coming from a company that was selling high performance file. Now to be able to solve that along with data orchestration, global namespace, data assimilation, tier zero, you know, that adds a lot more to the repertoire than just having a exotic high performance file system. So I think he's going to hit the ground running. We're super excited about that.

[00:21:54] And like I said, it took a lot for us to garner the proof points with actual deployments. And now that those are there, you know, we're going to see quite a bit of acceleration. And for anyone listening wanting to learn more about your future growth or what you're working on, tier zero, that's something I encourage everyone to check out. But where would you like to point everyone listening? Oh, please come to our website, hammerspace.com. At the bottom, there's call to action, lots of stuff to explore there.

[00:22:24] So you'll be able to find lots of things there and, you know, reach on out. Happy to happy to have you. Well, I will get links added to everything so people can find you nice and easy. And I'm very conscious of seeing you whizzing around today. You've been talking all day and I've even had to grab you from the buffet just to come here and speak with me. But thank you so much. It's my pleasure. Thank you for having me. Look forward to next time.

[00:22:46] Big thank you to David Flynn for diving deep into how Hammerspace is transforming unstructured data management for everything from AI, hybrid cloud and high performance computing. But their global data platform is attempting to revolutionize data access, ensuring that enterprises can seamlessly orchestrate, manage and scale their storage needs across any environment.

[00:23:12] And today's discussion around parallel network file system or PNFS was something I found particularly insightful. It's a technology that's becoming more relevant than ever, especially as AI workloads demand faster, more scalable storage solutions. And when you combine this with Hammerspace's record growth and recent industry recognition, clear that they're not just riding the AI wave. They're building the infrastructure that powers it.

[00:23:41] So the big question is, how will enterprises keep up with the growing complexity of AI driven data? Will solutions like Hammerspace's become the industry standard for high performance storage and orchestration? I'd love to hear your thoughts on this. Email me now, techblogwriter at outlook.com. X Instagram, LinkedIn, just at Neil C. Hughes. Let me know your thoughts. I'll be back again tomorrow with another guest.

[00:24:10] But as always, thank you for listening. And I will speak with you all bright and early tomorrow. Bye for now.