How DDN And NVIDIA Are Rethinking AI Infrastructure For The Rubin Era | The Tech Talks Network

What does it really take to turn a massive AI infrastructure investment into actual business value?

In this episode, I'm joined by Alex Bouzari, founder and CEO of DDN, for a conversation that gets right to the heart of where AI infrastructure is heading next. There is a lot of noise in the market about faster chips, larger models, and bigger data centers, but Alex argues that the real story has changed. According to him, GPUs are no longer the main constraint. The true bottleneck now lies in the data layer, where data is moved, cached, served, and managed across increasingly complex AI environments.

That shift matters because many organizations are still thinking about AI in terms of hardware acquisition. Buy more GPUs, add more power, build more capacity. But as Alex explains, that mindset misses the bigger picture.

If your data architecture cannot keep pace, those expensive systems stall, efficiency drops, and the return on investment quickly becomes shaky. It was a timely discussion, especially as NVIDIA's Rubin platform points toward rack-scale AI factories where compute, networking, storage, and offload all need to work together as one operational system.

One part I found especially interesting was Alex's focus on measuring efficiency. He argued that the future winners in AI will not simply be the companies with the most hardware. They will be the ones who think like industrial operators, measuring cost per token, rack utilization, time-to-value, and power consumption per unit of intelligence output. That is a very different conversation from the hype cycle, and it is one that business leaders need to hear. AI value is no longer about showing that something can work. It is about proving that it can work predictably, securely, and economically at scale.

We also talked about DDN's collaboration with NVIDIA, the role of BlueField-4 DPUs, and why inference performance now depends on intelligent memory architecture and data movement just as much as raw compute. Alex shared how DDN is helping customers reach up to 99 percent GPU utilization and reduce time to first token for long context workloads. Those numbers are impressive on their own, but what matters most is what they represent—better throughput, lower waste, and AI systems that move from science project to production reality.

There is also an important leadership lesson running through this conversation. DDN has been profitable for over a decade, powers more than one million GPUs worldwide, and has built its business by staying close to real customer pain points. Alex speaks with the kind of clarity that comes from building through constraints rather than simply talking around them.

If AI factories are going to define the next phase of enterprise technology, how should leaders rethink infrastructure, efficiency, and value creation before they invest in the next wave, and what do you think?

Useful Links

[00:00:04] Have you ever noticed how the AI conversation has slowly shifted from who has the most GPUs to who can make the whole thing work day in, day out, without wasting a fortune? Well, my guest today is the co-founder and CEO of Data Direct Networks. But you probably know them better as DDN because they are a leader in AI and high-performance computing data storage, or HPC.

[00:00:34] The world of tech does love a good acronym. But if you take a peek behind the curtain, they're a company that sits in the engine room of modern AI infrastructure. So if you've heard the phrase, no data, no AI at a conference lately, this is a conversation that puts real meaning behind it. Because Alex will make the case that the bottleneck has moved. The issue is no longer simply getting access to compute.

[00:00:59] It's actually about making data move predictably and efficiently across a system. So your GPUs are not sitting there idle while the power bill quietly climbs. And we will talk today about the shift toward rack-scale AI factories, why inference economics is now the real battleground, and why time-to-first token is one of those metrics that sounds technical

[00:01:24] until you realise it is the difference between something your customers can actually use and something that gets stuck in the pilot mode forever. So if you're building AI seriously, or even just trying to make sense of why so many projects are stalling, this one's for you. And with that, let's get into it. How do you turn a massive AI infrastructure investment into business outcomes you can measure? I'm not the guy to give you that answer. I'm going to officially introduce you to my guest right now.

[00:01:55] So thank you for joining me on the podcast today. Can you tell everyone listening a little about who you are and what you do? Absolutely, Neil. Thank you very much for having me. So my name is Alex Puzari. I'm the founder and CEO of DDN. And what DDN does is we've created and we've brought to market a data intelligence platform which industrializes AI.

[00:02:20] And that means we help the world's economy create business and financial value out of their AI investments. So, I mean, we power more than a million GPUs worldwide from NVIDIA to XAI. I mean, sovereign nations, thousands of enterprises worldwide. And basically trying to take all of these massive investments that have made in the world of AI in data centers and GPUs and so on

[00:02:49] and create value for customers out of it. Awesome. And I did meet your CEO, several members of your team. And I also interviewed, I think it was Amanda Lee from DDN as well as part of the IT press tour. Do you know Amanda there? Yes, just talking to her a few minutes ago. So yeah, she's part of the marketing team. And NVIDIA's Rubin, that is something that recently revealed signals somewhat of a shift towards Raxale AI factories rather than accelerator-centric systems.

[00:03:19] Feels like quite a big shift here. But from your perspective, what does that architectural change mean for enterprises that are actually building serious AI infrastructure now? Not only now, but over the next five years too. I mean, that's a great question. I mean, I think Rubin is when really AI stopped being about chips and started being about systems and delivering value from those systems.

[00:03:45] But more importantly, I think Rubin is also a significant shift in the power requirements and data center complexity in implementing these very, very powerful GPUs. Because they're much more demanding than they used to be. When you move to Raxale AI factories, while you have liquid cooling needs that are now essential, they're no longer just nice to have.

[00:04:12] You're talking about upwards of 200 kilowatts per rack. And so imbalance and inefficiencies really become lethal. And so if your data layer can't keep up, your GPUs don't just slow down, they stall. And when they stall, you're basically burning very, very expensive megawatts of power and capital without producing value.

[00:04:40] So I think that's really very important. I think what enterprises will realize over the next several years is that it's not owning the GPUs, which delivers value and competitive advantage. It is operating them predictably and efficiently. And this is, I think, the first pivot. We're hearing it from customers. We're even hearing it from NVIDIA. Increasingly, it will become complex to deploy and operate these GPUs,

[00:05:09] which means making the whole thing predictable and efficient will be key. And it all goes back, I think, to the data because you cannot do AI without data. Yeah, that's a phrase I'm hearing more and more at tech conferences. No data, no AI. And when I was doing a little research on you, I was reading how you've argued that GPUs are no longer the primary constraint here. So the question I've got to ask is if compute is no longer the limiter,

[00:05:36] where exactly does data movement begin to set the speed limit almost across modern AI platforms? Sure. I think people are still thinking that the scarcity of compute is the problem. And, you know, we've run lots of queries out there. And what our data says is completely different. I mean, 65% of all AI infrastructure is sitting idle.

[00:06:04] 99% of customers are reporting inefficiencies. And more than 50% have delayed or canceled AI projects because of that. So it's not about a compute shortage. I mean, it's really a systemic imbalance, which is coupled with just unpredictable deployment and operational patterns.

[00:06:25] And so the speed limit of AI today, I believe, is how fast, predictably, and intelligently data moves across these distributed infrastructures, from the edge to the data center into clouds and multi-clouds.

[00:06:42] And if your data movement is chaotic, well, your millions, tens of millions or billions of dollars that have been invested in AI strategy just becomes a very, very expensive science experiment. So the bottleneck has really moved from silicon to architecture. And you can no longer just wing it. You have to operationalize it. You have to industrialize it.

[00:07:08] And for that, it's really the data part of it which needs to be optimized. I mean, what I always say is that DDN is to data what NVIDIA is to compute. And to really create value out of AI, these two worlds have to come together. Compute and data have to come together. And efficiency needs to traverse the compute to the data, the edge to the data center, to the multi-cloud. And I've got a big stat I wanted to share here.

[00:07:38] DDN and NVIDIA have reported up to 99% GPU utilization and a 20% to 40% reduction in time to first token for long context workloads. I mean, what engineering changes made these massive gains possible? And why do they matter for real-world inference at scale? Again, another word we're hearing a lot this year. Yeah.

[00:08:02] So, I mean, as you correctly pointed out, I mean, the value creation has moved into inference. I mean, it used to be people were focused on the models. Let's train the models. Let's make the models more efficient. Well, then people said, okay, now that we have these great models, we need to get good answers from the models. And that's where inference kicks in.

[00:08:27] I mean, inference basically gets you the right answer so that you can deliver better products, better services, better optimization on your factory floors, better, faster drug discovery and bringing drugs to market. And most AI systems have been optimized for peak benchmarks. Well, peak benchmarks do not translate into industrial and operational reality.

[00:08:56] So we've basically optimized for sustained economic output. I mean, we've redesigned the data path so that GPUs don't wait. They're just operating at 95% to 99% efficiency 24-7. We've tiered KVCache intelligently for these long context workloads.

[00:09:19] A long context workload basically gives you the ability to keep more data, more content into short-term memory, which then gives you better answers, more cost-effectively. We've pushed the overhead into DPUs so that the CPUs can stop just babysitting the storage. And we've really engineered the data plane for AI concurrency, which is what is required.

[00:09:44] Now, the reason why all of this matters is because the time to first token is really the difference between a usable production-grade AI system and something that is a science project. And at scale, just 20%, 30%, 40% faster inference is not just improving latency. It's dramatically lowering the cost per token, which makes these AI investments viable and makes the ROI pencil out.

[00:10:12] Because at the end of the day, the ROI has to pencil out. I mean, people are investing in AI infrastructure in order to create value. So the value has to be there. And so the economics associated with the whole platform has to work out. It's really strategic. I don't think it's incremental. Wow. So as context windows will inevitably stretch into the million token range and inference becomes

[00:10:37] distributed, how does data caching, placement, and KV memory tiering, how do these things influence both performance and cost? Because for the business leaders listening, that's their language, right? Yeah. Yeah. I mean, for business leaders, I think, you know, so we talk to lots of, you know, CEOs and CDOs and CFOs. And, you know, what's really most important to them, it's okay.

[00:11:02] If I were to invest in AI infrastructure, whether it's in my own data centers or in the cloud or at the edge, depending the type of business they're in, how do I make sure that these investments will result in significant benefits, both business and financial for what I do? And I think what this long context AI has exposed is it's a very harsh reality.

[00:11:30] Memory architecture is now strategic. I mean, we've heard it from Jensen at CES, you know, a couple of months ago. He said, okay, the bottleneck, the value portion of all of this has now shifted to the data. I mean, the KV cash really becomes, I think, the working set of intelligence. So if you mishandle it, you're creating significant waste. And if you're creating significant waste, I mean, it's like operating a hotel where your

[00:11:58] rooms are consistently empty. Well, it piles up. And so you end up losing money. And so the economics don't work out. I mean, GPU waste is not something that is visible on PowerPoints, but it shows up immediately in your cost per token and your power consumption, because, you know, that's the bill that shows up at the end of the month, your power bill, your cost per token bill.

[00:12:23] So we tier context based on temperature, importance, and behavior. And so we keep what really matters very close to the GPU and the GPU memory. And we manage the rest without stars. Again, that gives you consistent and predictable behavior with the highest level of efficiency in the GPUs. And so it makes the ROI work significantly better. So that's how you really extend context without destroying economics.

[00:12:51] I mean, large models don't guarantee advantage. It is efficient memory architecture and deployment in production that creates that for you. And NVIDIA's Bluefield 4 is a next generation data processing unit or DPU, which is set to introduce deeper offload for networking storage and security. But how does integrating DPUs into the data layer, how's that reshaping the balance between

[00:13:17] CPUs, GPUs, and storage in almost a unified AI stack? For people that don't know too much about that, can you just expand on that for me? Sure, sure. I think that's a great question. So think of it as GPUs compute, DPUs manage the movement and the control, and CPUs orchestrate. So when you bring all of this together, if you're not thinking of it and you're not aligning these

[00:13:47] things properly, if CPUs are managing metadata and isolation and security overhead and storage interrupts, you're basically misallocating resources. And that takes us back to an inefficient system. And so your ROI will not work out. And so what's important, I think Bluefield 4 from NVIDIA validates something that we've believed in for many, many years.

[00:14:14] I mean, just stop wasting very expensive silicon on the wrong tasks. I mean, expensive silicon should be pointing only at those tasks which are essential to it and require it. So again, going back to predictability, it is predictability in running the models, in optimizing the models, and in delivering highest efficiency in time to first token and in value creation

[00:14:42] from inference, which allows AI factories to scale and to deliver business and financial value without collapsing under complexity. Increasingly, I mean, it's really not about tuning infrastructure. It's really about industrial engineering. Again, think of it as, you know, another one of my favorite examples is if you're going to build a high-rise and that high-rise has 100 floors.

[00:15:09] Well, 100-story high-rise is not 52-story homes piled up on each other. And so the industrial engineering of this is think of the business and financial outcome, which you're trying to deliver in your line of work, whether it's products or services or what it might be, and then industrially engineer it. It's back to first principles. It's what Elon always says. Take it back to first principles.

[00:15:39] Do not optimize something first. First, understand what the business and financial outcome is that you're trying to derive, simplify it as much as you can, and then optimize it. And I think that's where we are in AI today. We live in an age where we're pivoting into the industrialization of AI. And that means value creation for enterprises, for nations, and for consumers from AI. This is the pivot.

[00:16:07] We've shifted from a time of investment, hundreds of billions of dollars and trillions of dollars invested in building data centers and populating them with GPUs and bringing power to them. So that is an investment phase. Now we're into a value creation phase. And that value creation phase comes from bringing data and compute together.

[00:16:32] And I think one of the exciting aspects of this year is enterprises are now moving from experimentation to production AI. And I'm curious, from your vantage point here, what new governance and isolation requirements are going to emerge at that stage? And how are you helping customers secure data end-to-end while also maintaining performance? Bit of a balancing act and a lot of customers coming to you and asking for help, I would imagine here. Totally.

[00:17:00] I mean, it is a balancing act because on the one side, I mean, performance variability, lack of predictability becomes a P&L issue. I mean, typically you will have multiple models, multiple teams within the organization. You will have sensitive data. In many cases, there are regulatory scrutiny, there are shared infrastructure.

[00:17:23] And so if governance and isolation and quality of service aren't natively built into the data plane, into the data intelligence platform, well, complexity explodes. And when complexity explodes, well, you end up with significant inefficiencies, which takes us back into the ROI doesn't pencil out. So that's problem number one. Problem number two, even if you said, well, I don't care. This is something I need to do.

[00:17:52] As complexity grows, the people you need in your organization to handle that complexity, well, there are very few people out there who know how to handle complexity in the context of AI. So you cannot find those people. You cannot hire them. And so again, the whole thing collapses. It's complexity which kills AI ROI. Production AI requires predictability and simplicity at scale. You just can't do it without that.

[00:18:19] And that's really why, you know, we at DDN built the data intelligence platform. It's really to reduce operational inefficiencies and unpredictability as the environments grow. Because the environments are growing. Data and inference is increasingly becoming multimodal, which means it's not just text and numbers. It's images, it's video, it is audio.

[00:18:47] And all of this has to be part of the mix. Well, you know, once you start dealing with images and video, you know, if you're a car company doing, you know, autonomous cars or sensor data, or if you have cameras for security and surveillance that have been deployed in shopping centers or in airports and so on. Well, in all of these cases, you're dealing with video. And video is not something that you can move easily because video has very high gravity.

[00:19:16] It's very expensive to move it around. It's very difficult to handle it in real time. And so you need a data intelligence platform that doesn't move the data around, but moves the attributes of that data around, what we call metadata. And that's what we've done. I mean, when we started out, NVIDIA came to us eight years ago and said, hey, DDN, this is what we need you to do.

[00:19:41] We want you to showcase the capabilities that you can bring into our GPU-enabled solutions. From the get-go, we said, well, it will be multimodal. There will be images. There will be video. When you do a query, you know, on your chatbot or your favorite chatbot, again, increasingly people are pulling in video and images into it. And I think that's what's really important.

[00:20:08] Predictability and scalability at high levels of efficiency for enterprises, for nations, and for consumers is what will make AI get accelerated and what will make AI an industrialized reality. And one of the things I wanted to highlight today, one of the things I read before you joined me, is DDN has powered more than 1 million GPUs worldwide,

[00:20:34] and you guys have been profitable for well over a decade now. So what have you learned about operational discipline that startups in AI infrastructure space and founders that might be listening today often overlook? Because it's phenomenal what you've achieved here. Thank you so much, Neil. Well, look, I think, I mean, the AI market has a lot of storytelling built into it. I mean, we've always focused on first principles.

[00:21:00] And first principles is really value creation and value delivery. So being profitable means that everything we do, every technology that we create and design and bring to market, the one thing we focus on is it has to maximize customer value in business terms and or financial terms. That's the only way. Now, we do that with technology because, you know, after all,

[00:21:29] we're a technology company, we're a software technology company, and the value we deliver really stems from the technology that we create. But this technology must deliver the absolute best business and financial outcomes for our customers. And it has to do so across industries and use cases and nations and consumers and so on. So we design around real constraints. Well, if there aren't enough GPUs out there, if there are power ceilings,

[00:21:57] if there is memory and SSD supply volatility as we have right now, if there just are various types of constraints which our customers are faced with each and every day, well, the AI infrastructure must survive economic cycles, not just technology cycles. So it's discipline. I think always bring it back to first principles, which is,

[00:22:21] am I delivering more value to my customer than anybody else possibly could? And you keep iterating and you keep refining and enhancing the technology until you say, well, I don't think it's possible to deliver more value than what we are. That's when you know that you have something valuable. And then just continue to listen, be very curious. You have to continue to listen to what customers' pain points are.

[00:22:51] Ask them, what are the pain points? What is an ideal outcome deliverer? You know, one of the questions I love to ask customers is, okay, if I give you a blank sheet of paper and on that sheet of paper, you write everything that you need. And what you need is at a price point where you say, wow, this is great. What would that be? And you start from that.

[00:23:15] So design from the outside in, design with solving customers' business and financial requirements and pain points, and just stay razor sharp focused on doing that each and every day. Because we live in a world, the world of technology, where things continuously change. And so you have to continuously understand what is it that delivers and maximize its value for customers and just keep doing that.

[00:23:42] And we have mentioned NVIDIA several times today, and I've got to ask them to try and get a teaser out of you now. What are DDN's plans at NVIDIA GTC 2026? Anything you can share about what we can expect there? Sure, sure. So a few things. I mean, at GTC, I think basically the industry is celebrating faster GPUs, more GPUs. You know, a trillion dollars invested in data centers, in power and in GPUs.

[00:24:11] What we're showcasing is DDN's data intelligence platform, which is the layer which actually monetizes and creates value from these massive investments. I mean, again, DDN is to data what NVIDIA is to compute. And it's only by bringing data and compute together that you create value from AI. So what we'll be showing at GTC, we will show AI factories that are running as industrialized economic systems, not as science projects.

[00:24:39] We're going to showcase specific AI factories in financial services, in life sciences, for neocloud providers, for enterprises, and for nations. So we will be powering industrialized, highly efficient rack-scale systems, with GPUs running 24-7 at much smaller power footprint, at much smaller data set of footprint,

[00:25:07] and much more and more efficient memory utilization. And we're also showing an AI factory bus, which we came up with a few months ago. We said, okay, let's just take a bus. In that bus, we're going to showcase real-world applications of AI with real-world business and financial outcomes. So customers will be able to see actual use cases with true value being delivered from AI, a bank compliance use case,

[00:25:35] a hedge fund complex model value creation, a protein folding use case, which speeds up pharmaceutical drug discovery, real-time identification of bad guys in public places, and on and on and on. So because I think at the end of the day, you have to demonstrate real-world use cases so that enterprises say, uh-huh, I can actually create value out of that, as opposed to, well, I should be doing AI, everybody is doing AI,

[00:26:04] what kind of AI should I be? You have to bring it home into their world, in their language, and in their use case, so that you can deliver real value. So these are the things that we will be showcasing at GTC, the industrialization of AI and actual concrete real-world enterprise use cases, which enterprises can deploy today and benefit from today in business and in financial outcomes.

[00:26:34] Absolutely. Love that. Exciting times indeed. And if I was to take you even further in the future, way past GTC 2026, if we fast forward to a world of fully realized AI factories where compute, networking, and data operate as one integrated system, what is it you think will separate organizations that can extract business value from those that just simply deploy more hardware? Do you see that gap widening? I think that's a great question.

[00:27:04] I think the winners, the ones who will actually extract and create real value from AI, and again, it's business and financial value, those winners will think like industrial operators. So I think they will measure what is my cost per token? What is my time to value? What is my rack utilization? What is my power consumption

[00:27:30] per unit of intelligence output, let's say? What is my time to intelligence? I mean, I think the ones who will be left behind are the ones who will be counting, okay, how many GPUs do I have? How many megawatts of power do I have? I think we're entering a phase where competitiveness in AI will be decided by infrastructure economics, not by the size of the model and the amount of power you have.

[00:27:58] I think the hardware, the infrastructure, the data center, you know, the racks of GPUs are creating possibilities. It is the economic efficiency that will create business value, and that's what will separate, I believe, the winners from the losers. And economic efficiency lives in the data intelligence layer which DDN is delivering. So it's really that. Think like industrial operators

[00:28:27] and don't think that you're investing in infrastructure just because everybody else is. Think with that infrastructure, what sustainable business and financial value am I delivering to my organization? And how does that pencil out in great ROI? Wow, and I think that is a powerful moment to end on. But before I do let you go, anyone listening, maybe they want to connect with you, your team,

[00:28:56] learn more about the announcements coming out of DDN and indeed out of GTC in March and also any other big announcements that I'm sure you've got coming on the horizon. Where would you like to send everyone listening? Our website, DDN.com, on LinkedIn, on X, at DDN Intelligence. But I think most importantly, if your listeners, enterprises, consumers, nations are using

[00:29:25] or building AI seriously, I urge everybody to start measuring efficiency because, I mean, really, in this industrialization of AI, the advantage will not be in the ones who bought the most GPUs or had the most power in their data centers. It will be the ones who made their investments the most productive to optimize their business and financial objectives. And that is really important.

[00:29:54] That is a mental pivot which I think organizations need to make. And that's how successful organizations run their business. They measure efficiency and they measure value creation. So, do the same for AI because that's what AI is today. 100% with you. And I cannot thank you enough for coming on here today and breaking down what NVIDIA's, Ruben, really means about AI infrastructure, why data movement is now setting

[00:30:24] the new speed limit and that big call to action there, measuring efficiency. People listening, if they take one thing away, please remember that. And obviously, DDN, nearly $1 billion in revenue, 40-50% year-on-year growth, profitable for over a decade. NVIDIA, using DDN internally for eight years, exciting time for yourselves as well. But just thank you for taking the time sharing your story and those invaluable insights. I will add links to everything you mentioned, but thank you

[00:30:54] for your time today. Thank you very much, Neil. Really, really appreciate it and have a great evening in the UK. So many things I enjoyed about this conversation because it didn't just feel theoretical. It felt like a behind-the-scenes look at what happens after the hype when enterprises start asking harder questions about efficiency, predictability, whether the numbers actually add up. And I think Alex laid out a simple challenging idea there.

[00:31:23] The next wave of AI winners will look like industrial operators. They're the ones that will be obsessing over cost per token, utilisation, and time to value because that is where the real separation happens once everyone can buy similar hardware. And I love the way that Alex kept pulling it back to outcomes because that is the part that leaders care about most, at least they should do. So if you want to keep up with DDN and everything that they're doing, especially in the run-up to NVIDIA GTC,

[00:31:53] you can find them at DDN.com. I'll also share links in the show notes where you can connect with Alex and his team. But as always, love to hear from you on this one. Are you seeing your organisation or others shifting from counting GPUs to measuring efficiencies? Or are you or others still stuck in that buy more hardware mindset? I'd love to hear your thoughts on what you're seeing out there. I want to see the world vicariously through your eyes. So please message me over at techtalksnetwork.com and I will speak

[00:32:22] with you all then. But that is it for today. So I'm off now. I'll speak to you again tomorrow. But until next time, don't be a stranger. Bye.