PuppyGraph at IT Press Tour: Zero-ETL Graph Analytics on Your Existing Data | The Tech Talks Network

What does “infrastructure” mean when your data stays exactly where it is, yet suddenly behaves like a graph?

I met Weimo Liu, CEO and co-founder of PuppyGraph, during an IT Press Tour presentation, and I wanted to bring his story to Infrastructure As A Conversation because this is a data infrastructure conversation at its core. Weimo’s pitch is simple to say and harder to pull off: keep a single copy of data in your lake or warehouse, skip the ETL pipelines, and still run graph queries with subsecond performance.

Weimo’s background explains why this is more than a clever demo. He worked at TigerGraph, then on Google’s F1 team, and PuppyGraph sits right between those worlds. In our conversation, he walks me through how they treat graph queries as a set of node and edge operations that can be optimized, parallelized, and evaluated in a vectorized way, which is how they keep performance predictable when workloads get real.

We also get into the practical details infrastructure teams care about. PuppyGraph is a read-only engine, which changes the trade-offs around concurrency, governance, and operational risk. Instead of copying data into a separate graph store and building a second set of controls, you can query relationships where the data already lives, then write results back into the lake for other engines to consume. The upside is simpler architecture and less duplication. The compromise is that you are not getting transactional graph updates, and Weimo is clear about why that is acceptable for the OLAP-style workloads his customers run.

From there, the use cases start to make sense fast. Cybersecurity teams with logs sitting in object storage, fraud detection scenarios where latency matters, and internal AI chatbots that struggle with too many tables and brittle SQL generation. Weimo has a sharp analogy for that last part, text-to-graph queries behave more like a train on rails, which can help AI stay inside defined relationships and reduce messy answers.

If you are building modern data platforms and you are tired of pipelines multiplying, this episode is a thought-provoking look at what happens when graph analytics becomes a query layer rather than a destination system. And it all started with a dog-themed name and a surprisingly cheap domain.

[00:00:01] I'm joined by the CEO and Co-Founder of PuppyGraph, and they're shaking up the enterprise data world with a bold promise. No more ETL, just connect your existing tables and query them as a graph and do it instantly. It is a concept born from the fusion of two very different experiences. A graph database startup had Google's legendary F1 query engine, and out of that, an unlikely pairing came.

[00:00:31] The idea that graph queries shouldn't be locked behind pipelines, copies or complexity. So today we're going to talk about why enterprise graph tech has struggled to take off, how PuppyGraph's Zero-ETL architecture flipped conventional thinking, and what happens when you combine structured data with sub-second graph analytics. Because from cybersecurity to AI-driven chatbots, we're going to explore it all today.

[00:00:59] And the real-world use cases that are already providing that value, providing impressive outcomes. And yet, we'll also get to the bottom of that origin story of the name PuppyGraph. And it is as fun as it sounds. But enough for me. Let's get my guest on now. So a massive warm welcome to the show. Can you tell everyone listening a little about who you are and what you do? Hi, everyone. It's a great honor for me to be here. Yeah.

[00:01:28] And I'm the CEO and the co-founder of PuppyGraph. Yeah. And we build a graph query engine on your existing data. Query data as a graph without ETL. Yeah. And I've got to ask, I wish I'd find out more about the origin story of my guest. So what was it that inspired the Zero-ETL architecture of PuppyGraph? And how does it challenge maybe the traditional assumptions around graph databases that need

[00:01:57] separate storage and pipeline infrastructure? Tell me more about that. Yeah. Since it's a kind of a merge of my previous two jobs. And my first job is at a graph database start called the Teggraph. And then my second job is in Google F1 team, a unified SQL query engine inside Google. And then I think, oh, this is a very interesting design since F1 can query all the data inside

[00:02:26] Google without moving it. I think if this can support a graph query language, it will benefit more since it's a remove, I think, the major blocker of a graph technology. Yeah. And while PuppyGraph claims sub-second performance on massive data sets, I've got to ask, how do you ensure query reliability and performance consistency in real-time environments where latency is much

[00:02:55] more critical in areas like fraud detection or security analytics? Yeah. So we have several ways to handle this problem. Since I talked with some of our friends from Trino and Spark, and they have been built some graph engine on top of other engines like Spark or SQL query engine. So it's first time, it's not an optimal for graph, but it's slow because it's not optimized for graph.

[00:03:22] What we are doing is that we define the basic operator at the nodes action and edge action, and the input is the collection of nodes or a collection of edges. After that, any graph query can be a combination of node action and edge action, and then we can do the cost-based optimization. And for single action, because the input is a collection of nodes or edge,

[00:03:48] then we can do it parallelously and also vectorize the evaluation. In this case, we can make it very fast. And at the same time, we are distributed engine. So more machine, better performance. So people can always pay more for hardware to speed up. This is very different from most of the graph database on the market. And one of the persistent barriers to graph analytics adoption has always been developer experience.

[00:04:18] So what steps have you taken at PuppyGraph to make graph schema creation and query authoring more accessible to teams used to SQL, for example? Yes. Since my experience at TigerGraph engage with a lot of customers, people need to prepare a very complex data pipeline to load data from somewhere else to graph database, and then run graph query and graph algorithm.

[00:04:46] I think this makes the adoption of a graph very slow. And since even if people change their mind of their graph schema, they need to reload all the data, which is a disaster. And what we are doing is that the people just need to tell us, oh, this table is a type of nodes, and this table is a type of edges. And after that, physically they still have a set of tables,

[00:05:16] but logically they already have a graph. Then they can just run graph query or graph algorithm as a graph database. So they don't need to rewrite the data. They just have a logical graph schema and these mapping two tables. And the underlying logic and the complexity, PuppyGraph will handle for users.

[00:05:41] And when doing a little research on you, I quickly let that PuppyGraph promotes a single copy of data, multiple query engines philosophy. So how do you handle concurrency and governance, especially when graph queries run alongside SQL queries on the same data sets? Since we are a read-only engine.

[00:06:06] So then this is a big advantage of PuppyGraph as well. So if one single copy of data, you can support SQL query engine, Spark or graph query at the same time. You don't need to load into, for example, if in graph database, you can no longer use SQL anymore. And now because there are only one copy of data and you can also use some other engine.

[00:06:33] And at the same time, we can batch write the result back to data lake. In this case, it's a batch writing. And then the result can be leveraged by other engine. In this case, it won't have a conflict. But of course, we sacrifice some features like transactional updates of graph database. So we don't support ACID at all.

[00:06:59] We just the read-only engine, catch the latest updates of data source. In this case, people gain the performance and the scalability of all app workload. But at the same time, the user no longer have transactional features like ACID. But based on our feedback from our users, because they already use data lake or data warehouse, the ACID feature is not what they want.

[00:07:25] So we sacrifice what they don't want and make the features, performance and scalability better for the users. Yes. And when looking at where this technology delivers the most value, which verticals do you see as the most ready for graph analytics at scale? And where is maybe market education still needed in other areas? Where are you focusing here?

[00:07:55] We see some product market fit already. And I think the first one is cybersecurity. And we are very unique in this area. Since a lot of cybersecurity companies, they store all the data, all the logs into S3 as an object store. In this case, they still want a graph, but it's impossible for them to load in the data to graph database because data volume is too big.

[00:08:22] And at the same time, a single row of data is not very valuable because the machine generated data is generated every second. And so we, because they already use the object store like S3 to store the data. And now we are on top of it and just brewing the insight of their logs. And then they can have some very complex feature in dashboard.

[00:08:51] And also they can have a very happy with the sub-second performance. And we also have some others like chatbot, which is AI is popular. The large-long model is very popular. And some users want kind of an AI data hub because they want to answer a data-oriented question. And at the beginning, they try text to SQL.

[00:09:17] And they want to have a problem and the real blocker for them is that they have so many tables and they want to abort more and more tables. In this case, if they just do a prompt with SQL, it's very complicated. But now they can just add more and more tables into a graph schema and then just do some prompt and generate some graph query.

[00:09:48] In this case, the prompt engineer is much easier. And based on their feedback, their internal customer is pretty happy about the chatbot. It's already generated some value. And of course, there are some additional, there are some other industry like anti-fraud and the supply chain.

[00:10:11] And especially when the data is big, we are kind of the unique solution in the area because it's scalable and the cost is very low. Yeah. And what role do you see GraphRag playing in shaping a future of enterprise AI applications, especially, and I would say this is probably the most important part at the moment, in reducing hallucinations and improving response accuracy?

[00:10:38] What role do you see GraphRag playing there? In my understanding, text SQL is something like self-driving of a car and text to Graph query is a kind of a self-driving of a train. So there are already rules there. The one type of nodes can only connect to several types of edges.

[00:11:03] In this case, when you do the prompt, you just let the LLM nodes are semantically for each edges and each type of nodes. And then they can have a very good understanding. Otherwise, you need to teach LLM like these two tables can join on which foreign key.

[00:11:24] And especially when there are many tables in the enterprise, which is a very common situation, it will be very hard to do the prompt, especially when some of customers want to onboard more internal team inside their company. It's very hard. We'll spend a lot of time to onboard more tables and more semantics.

[00:11:52] And you do have a relatively small team. I think you've got 15 employees and $5 million in seed funding. So how are you prioritizing between engineering innovation, go-to market growth and building out your partner ecosystem? You've got a lot going on here. How do you get that balance right? Yeah, well, highly depends on the customer request.

[00:12:15] We have a roadmap, but the priority is changing all the time and it's really based on our customer request. And we believe that the feature customers want is the feature useful. Yeah, so we don't want to build a car internally and then without any outside feedback and then no one wants to drive it.

[00:12:41] So we want all the features based on customer request. And but of course, we will discuss with the customer whether it's a real request or is something unnecessary. So but usually our customer very smart. They provide some very useful feature and the request and the request can be leveraged by other customer as well. Yeah.

[00:13:10] And if we look ahead, I'm going to try and get some teasers out of you of what we can expect next year and beyond. What would you say are the most critical features or indeed capabilities that you plan to develop in the next 12 months and ultimately help you stay ahead of potential competition from traditional database or graph vendors that are out there? Anything you can share around your plans? Yes.

[00:13:35] First is that we want to make our auto graph schema or generator better. In this case, user can just connect to their data lake or data warehouse and then graph schema can be auto generated. And at the same time, we want to make our MCP and also the LLM framework more robust and easy to use.

[00:14:04] In this case, all the pipeline can be smooth, like connect to data source and the auto generated graph schema and the auto problem to LLM and the user can just ask a question to it. I think this is a, uh, uh, uh, was a main feature. We are development, uh, in development. Yeah. Exciting times ahead.

[00:14:30] And today on this podcast, you've shared your origin story with me, where you are now, what you're building for the future. But of course, none of us are able to achieve any success without a little help along the way. So as we come full circle, if you look back at your career, is that a particular person you might be grateful towards? They helped you get you where you are. Maybe we can give that person a shout out to today, but who would it be and why? Yeah.

[00:14:59] Since I think I think about this problem for many years since I was at the telegraph, I think it's a very cool technology and, uh, the performance is good. And, uh, people reaching out to us, we already have some branding, but it's still hard to, the graph technology is still hard to be adopted. And I think it's a very different system, uh, very different than academia.

[00:15:23] Since, uh, when I was a PhD, uh, I read a lot of paper about the graph technology, but, uh, it's very hard to adopt it by industry. And until I joined Google, I think, oh, maybe it's the reason like, uh, uh, ETL is the main blocker since, uh, when I was at F1, uh, so advantage of no ETL, uh, make a lot of, uh, internal team of Google connected to F1.

[00:15:52] And at the beginning, I think we only support, uh, several data source, but then more and more team have more and more data source and the state of data format reaching out to F1 and then do the integration. I think, oh, this may be the reason why it can be the unified SQL query engine. And maybe is the key reason of, uh, uh, remove the blocker of a graph adoption.

[00:16:17] And, uh, based on our, uh, feedback last two years, I think is, uh, uh, I think we find the answer and, uh, our customers are pretty happy about this. Yeah. It's the main reason, uh, they choose PuppyGraph because their data is so big and they don't want to, uh, do the ETL. They just want to have the value first and then do more investment. Yeah. Love it.

[00:16:46] And for anybody interested in finding out more information on all things, puppy graph, a graph query engine for all your data, where would you like to point them? Where can they dig a little bit deeper and, and find out more information about it? Yeah, we have, uh, our, uh, website is the puppy graph.com and the domain name also, uh, only costs us $3. Yeah. Yeah.

[00:17:11] And, uh, uh, we put a lot of demo and the technical blog and, uh, have some summary like, uh, uh, different graph technology and, uh, how different the industry leverage, uh, graph technology. And not only about the puppy graph, but also we do some, uh, investigation and the share some, uh, knowledge we have about the graph. Yeah. Oh, so well, I'll add a link to everything before I let you go. I have a question I've got to ask was where did you get your name from?

[00:17:40] Where did puppy come from? Uh, you said it was a $30 domain name. That's gotta be attractive, but it was, was there another story behind choosing puppy graph? Uh, firstly that we have, uh, puppy in our team. And, uh, the second reason is that one of our Android investor was the first VP marketing of a new 4J.

[00:18:02] And, uh, he told me that, that something cute and everybody familiar with, and also what are you are doing? So it's a puppy graph. Yeah. And of course the third reason is that the domain name only costs us a $3. Oh, what a brilliant story. Well, I'd love to stay in touch with you. See how your story continues to evolve, but more than anything, thank you for joining me today and sharing your story on all things puppy graph.

[00:18:31] I wish you the best of luck for the future, but thank you for sharing it today. Yeah. Thank you so much. Yeah. A real treat meeting my guest at the IT press tour. And I don't know about you, but there's something refreshing about seeing deep technical innovation delivered with such clarity and good humor too.

[00:18:51] And the idea that you can keep your data right where it lives and still tap into the power of graph analytics feels like a genuine unlock for so many teams. And what stood out to me most was this focus on solving real pain points, not by adding more layers or buzzwords, but by simply removing barriers. No ETL, no reloading, just one logical schema layered over your existing tables.

[00:19:20] And it's fast distributed and it's built with the kind of engineering precision that you'd expect from someone who's been inside both tiger graph and Google. And of course, it's the branding in a space that usually takes itself far too seriously. Lastly, puppy graph is a timely reminder that powerful tech can still be playful. But these are my takeaways. What did you think? Are graph queries finally ready to break free from their niche?

[00:19:48] And is the future of AI powered analytics actually hiding in your table joins? Let me know. I'd love to hear your thoughts. Tech blog writer, outlook.com. LinkedIn, X, Instagram, just Anthony or C Hughes. Pop over to Tech Talks Network. You'll find eight different podcasts that I host. And you also find everything that I've recorded at the IT Press Tour on the infrastructure as a conversation podcast. But that is it for today. So thank you for listening as always.

[00:20:18] And I'll speak with you all again soon.