3100: How Denodo is Building Smarter AI with Real-Time Data Integration
Tech Talks DailyNovember 26, 2024
3100
27:1421.81 MB

3100: How Denodo is Building Smarter AI with Real-Time Data Integration

Have you ever wondered what it takes to transform data management into a fearless, innovative journey that empowers businesses to thrive in a data-driven world? In this episode, I'm joined by Alberto Pan Executive VP & CTO at Denodo, a global leader in logical data management and integration.

Alberto takes us through Denodo's remarkable growth story and its groundbreaking innovations, including the Denodo 9 Platform and its recent advancements in Generative AI.

As businesses grapple with the complexities of Generative AI, Denodo is stepping up with pioneering solutions. Alberto discusses how their collaborations with Google Cloud Vertex AI, Amazon Bedrock, Nvidia, and OpenAI are revolutionizing enterprise AI initiatives.

Through Retrieval Augmented Generation (RAG), Denodo is enhancing AI applications by connecting large language models with up-to-date corporate data, addressing challenges like security, privacy, and query accuracy. These advancements not only enable the creation of intelligent virtual assistants but also significantly improve enterprise AI's ability to meet real-world business needs.

Alberto also shares the unique features that set Denodo apart, emphasizing the value of its logical data platform for delivering data at the speed of business. From real-time unified data access to a semantic layer that ensures accuracy and governance, Denodo's innovations are driving efficiency and ROI for organizations across industries.

We'll also explore future trends in enterprise AI, from the importance of solid data foundations to the increasing convergence of structured and unstructured data. With insights into Denodo's partnerships and tools like the Denodo AI SDK, this episode is packed with actionable takeaways for anyone navigating the rapidly evolving data landscape.

How do you see Generative AI shaping the future of data management? Tune in to the discussion, and let us know your thoughts!

[00:00:03] Have you ever wondered how enterprises are leveraging new technologies to refine customer interactions and even enhance the precision of their digital responses?

[00:00:15] Well today I'm going to be joined by Alberto Pan. He's the CTO at a company called Denodo.

[00:00:22] And together we're going to discuss the exciting developments at the intersection of enterprise applications and generative AI.

[00:00:31] And Denodo has integrated with Amazon Bedrock to empower businesses with large language models that not only heighten security and privacy,

[00:00:41] but also pave the way for responsible AI within corporate settings.

[00:00:46] So my guest today is going to share how their cutting edge technology is revolutionizing the way that companies can develop GenAI applications

[00:00:55] while also ensuring that these powerful tools are both effective and ethically aligned.

[00:01:01] But enough from me. Let's get my guest on now.

[00:01:05] So a massive warm welcome to the show.

[00:01:09] Can you tell everyone listening a little about who you are and what you do?

[00:01:12] Yes, of course. Hi Neil. Thanks for having me.

[00:01:15] Well, my name is Alberto Pan and I am the chief technical officer of Denodo.

[00:01:20] And I've been leading our research and development organization since the company's inception.

[00:01:28] Although Denodo is now a global company with offices in 25 countries,

[00:01:33] we actually started in Coruña, a small city in the northwest of Spain.

[00:01:37] And I am still based there along with much of our research and development team.

[00:01:43] My background is primarily in data integration and management.

[00:01:47] And actually Denodo is a data management company that enables organizations to create a unified data access layer

[00:01:56] across multiple data sources updated in real time.

[00:02:00] We often call this unified layer a semantic layer because it allows expressing the data in the language of the business,

[00:02:09] making it easily accessible and usable for business users.

[00:02:14] And much faster than with traditional data integration methods that force you to consolidate all data up front.

[00:02:23] Well, as you said there, I mean, Denodo to myself when I was researching you and the things that I've been reading online,

[00:02:30] you are known as a leader in data management, providing this unmatched performance and unified access to the broadest range of enterprise big data cloud and unstructured sources.

[00:02:41] But one of the things that really put me put you on my radar was Denodo's integration with Amazon Bedrock

[00:02:48] and how that is enhancing the capabilities of large language models to for enterprise applications.

[00:02:55] So can you tell me a little bit more about that?

[00:02:58] And ultimately, when everyone's getting excited about AI, what are the benefits that this brings to organizations that are looking to leverage GenAI?

[00:03:07] Because there's a lot of hype around the topic, but I think learning more about what it does and the benefits that it brings.

[00:03:13] That's the magic, isn't it?

[00:03:14] Yeah, absolutely.

[00:03:15] Well, a common challenge that organizations face when building GenAI applications is how to leverage the data stored in their internal databases, right?

[00:03:28] For instance, if you are creating a GenAI application in a financial services company, if I am, for instance, creating a chatbot for employees,

[00:03:37] I wanted to be able of answering questions like how many loans were granted this week or how are sales going for this new financial product that we just launched, right?

[00:03:49] But in large organizations, that information is often spread across multiple databases.

[00:03:56] Also, that information changes in real time.

[00:03:59] And this makes it impossible to train or find new models with this data that is changing constantly, right?

[00:04:07] So, the Node's integration with Amazon Bedrock solves this by enabling these corporate GenAI applications to access and use real-time data from all their internal databases.

[00:04:22] So, these applications can provide accurate and up-to-date answers to questions like the ones I just mentioned.

[00:04:31] And to achieve this, the idea is that we first use the Node to create this unified data access layer, this semantic layer that I was just mentioning.

[00:04:42] And then using the power of Bedrock and other components of the AI ecosystem like vector databases, we are able to translate user questions into queries over this unified data access layer.

[00:04:58] And these queries will be executed by the Node to get the data we need from the databases of the organization in real time, right?

[00:05:07] And for this, we use a particular technology pattern that is called RAC, which stands for Retrieval Augmented Generation.

[00:05:15] That is quite popular when creating these GenAI applications in big organizations.

[00:05:23] And as a cautious ex-IT guy that gets nervous when people mention AI in corporate environments, I've got to ask the question.

[00:05:30] How do you ensure security, privacy and responsible AI when integrating your platform with Amazon's Bedrocks large language modules?

[00:05:40] Are there any particular safeguards or practices to manage data risks, for example?

[00:05:48] Yes.

[00:05:49] Well, actually, as you say, this is one of the main concerns that organizations have today using GenAI, right?

[00:05:55] And there are several layers to ensuring security and privacy.

[00:06:01] First, with this RAC pattern that I mentioned that we use to integrate the Node and Bedrock, the large language model doesn't need direct access to the organization's data, right?

[00:06:11] The large language model will only see what we call meradata, meaning data about data.

[00:06:18] So that means that Bedrock, in this case, will know details like what data views are available, what are the type of information that they provide,

[00:06:27] what is the semantics and the formats of the data columns in this unified data access layer, but not the data itself.

[00:06:36] In summary, we can say that the large language model gets access to the shape and the meaning of the data, but it doesn't see the actual data itself.

[00:06:45] So that's the first part.

[00:06:46] Second, we also ensure that users only access the data that they can see in line with the company's security and privacy policies.

[00:06:57] For instance, if someone asks the questions that I mentioned before, how many loans were granted this week, they will only receive an answer if they have the needed privileges, the needed permissions to access that type of data.

[00:07:10] And for this, the Node supports fine-grained privacy controls and data masking rules to define exactly what data can be accessed by specific users or roles in the organization.

[00:07:24] And on this podcast, I always try and focus on the business value and what problems we're fixing with any technology such as AI.

[00:07:32] And in terms of delivering accurate and tailored responses to customer prompts, I'm curious, what improvements have you observed in enterprise-gen AI applications,

[00:07:43] especially using the Denodo platform with Amazon Bedrock?

[00:07:46] Anything you can share around here on the kind of things that you're seeing?

[00:07:50] Yes.

[00:07:51] Well, a key factor in enabling large-language models like those offered in Bedrock to generate accurate queries over your internal databases is providing detailed information,

[00:08:06] what we call detailed metadata about the meaning and format of your data.

[00:08:11] And this metadata needs to be expressed in the language and terminology of the business because, well, that is what users will use in their questions, right?

[00:08:20] Your users will not use technical language to, will not know nothing about the internals of your databases.

[00:08:27] They will ask their questions in business language.

[00:08:31] So, if you take the naive approach of only giving the large-language model technical metadata, like, I don't know, table names, table schemas, you know, the type of technical information that you have in a database,

[00:08:46] then even the best models like those offered by Bedrock will struggle to generate accurate queries.

[00:08:53] And Denodo helps with this by providing this semantic layer for unified data access that uses business terminology.

[00:09:02] And also Denodo enriches this with additional metadata, such as descriptions, relationships between data views, example values, column formats.

[00:09:13] And this allows the large language model to better understand your data and actually match that data with the queries or with the questions of your users.

[00:09:24] For instance, in a recent white paper that actually is available in our website, we used two standard benchmarks to measure the accuracy of this question answer process over internal databases.

[00:09:38] And with the naive approach, we saw only 20% accuracy in the first benchmark and 50% in the second.

[00:09:45] But when using Denodo and Bedrock together in this way, accuracy increased to over 80%.

[00:09:52] So that's a very significant improvement.

[00:09:55] And of course, one of the showstoppers of any AI journey is data.

[00:10:00] And you kind of alluded to it a little there.

[00:10:02] It's the lifeblood of any AI implementation.

[00:10:05] So how does the Denodo platform address the challenge of data quality, data accuracy and training large language models for Gen AI applications?

[00:10:15] And what role does data virtualization play in that entire process too?

[00:10:20] Yes.

[00:10:21] Well, there are two points of view or two things I would like to mention here.

[00:10:27] Yeah.

[00:10:28] First, when your data changes in real time, as in the examples that we have discussed so far, then you probably don't want to train or fine tune the large language model with the data itself.

[00:10:40] Because training is typically a batch process that is better suited for more static data.

[00:10:46] And for dynamic data, this is where the node one data virtualization can mean providing a real time solution without the need for constant retraining.

[00:10:56] But second, yeah, for the static data, where you do want to use training, organizations often face the challenge of pulling this data from multiple sources with very different levels of quality, different levels of structure, different levels of consistency.

[00:11:16] And the semantic layer provided by Denodo helps with this by presenting data in a consistent way.

[00:11:25] Also, in these semantic layers, you can apply data quality policies to validate the data.

[00:11:30] And you can also ensure uniform semantics.

[00:11:34] So, this helps guaranteeing that the data that you use for training is complete and is reliable.

[00:11:42] And just to bring to life what we're talking about here, are there any real-world examples or indeed use cases where companies are seeing real tangible benefits from integrating Denodo with Amazon Bedrock and creating enterprise-gen AI solutions?

[00:11:57] Anything you can share around that?

[00:11:58] Yes.

[00:12:00] Actually, there are many examples of joint customers of Amazon and Denodo benefiting from this integration.

[00:12:08] For instance, some examples have come to mind.

[00:12:12] This is a multinational IT company that is using this approach for self-service information across various operations such as sales, project tracking, and also the health of their data center services.

[00:12:30] So, for instance, employees can ask questions like, what service availability issues have we had in the last two weeks or things like that.

[00:12:39] Right?

[00:12:40] For instance, another example is a law firm that is using this solution to answer real-time questions about customer metrics and customer engagement.

[00:12:51] So, for instance, if they are meeting with a customer today, then they can ask questions such as, what is the satisfaction level of the client I am meeting?

[00:13:04] Or how has their engagement with our services evolved over time?

[00:13:10] Right?

[00:13:10] That's another good one.

[00:13:12] And maybe one of my favorites, because it's already in a very advanced stage and very heavily used inside the organization, is a global healthcare company that has implemented this approach to create what they call a clinical data fabric that allows users to access data about patients, about healthcare practitioners,

[00:13:40] and also about clinical studies in real time.

[00:13:44] As you can imagine, privacy and security played a very important role in preparing this particular use case.

[00:13:54] And I think the results have been very successful so far.

[00:13:59] And with the growing emphasis on responsible AI, it's a topic I hear at every tech conference I go to around the world.

[00:14:06] With that in mind, how are you helping organizations navigate some of those ethical considerations when deploying large language models in business environments?

[00:14:16] Because I suspect this is a question you get asked a lot too, right?

[00:14:20] Yes.

[00:14:21] Well, I think one key point when navigating these ethical considerations is data governance, having a solid data governance foundation.

[00:14:32] Because when we have data, as it often happens in big organizations, distributed across many data sources in multiple formats and with different semantics and different levels of data quality, it can be very difficult to ensure that you are not introducing inaccuracies and bias.

[00:14:56] We think that Denodo can help with this because the unified data access layer that Denodo creates across all the internal databases and data systems of the organization provides a single entry point to define and also to enforce these security, privacy and governance rules across all the data sources in the organization.

[00:15:24] And also across all the data consumers.

[00:15:27] And this is really one of the required previous steps to ensure an ethical use of the data.

[00:15:34] So let me also be clear.

[00:15:37] The Nodog will not entirely solve the problem of addressing ethical considerations by itself.

[00:15:43] You will need to have a solid practice on top of it.

[00:15:47] But we think it can provide a solid data foundation that you can rely upon as one of the building blocks for this strategy.

[00:15:58] And I suspect there'll be many business leaders listening wanting to go all in with Gen AI applications without any understanding of the development process and the work that the developers will need to do.

[00:16:11] So on behalf of every developer listening, how does the integration of Denodo with Amazon Bedrock, how does that streamline the development process for those new Gen AI applications?

[00:16:21] And are there any specific features or capabilities that might make it easier for enterprises to get started?

[00:16:28] Yes, to streamline the process and to try to make this as easy as possible.

[00:16:35] We are releasing what we call the Denodo AI SDK.

[00:16:38] This is a software development kit that developers in organizations can use to quickly create these Gen AI applications.

[00:16:48] This SDK provides APIs and utilities for quickly integrating Denodo and Bedrock

[00:16:56] and also for making it easy to start building these Gen AI applications integrated with their particular ecosystem and leveraging real-time data from all their internal databases and data systems.

[00:17:13] This SDK is open source and can be used not only with the commercial versions of Denodo, but also with Denodo Express, our free community edition

[00:17:23] that can be downloaded for free from our website.

[00:17:28] So even if you are not a Denodo customer, even if you are not using Denodo currently, you can still get the benefits that we have been mentioning.

[00:17:40] And I encourage listeners who are interested in creating this type of Gen AI applications to try out themselves to see the benefits in their particular scenarios.

[00:17:52] And looking ahead into the future, I mean, we're now weeks rather than months away from life in 2025.

[00:17:58] So I've got to ask, are there any trends that you anticipate in the evolution of large language models and Gen AI within enterprise applications next year and beyond?

[00:18:09] And how are you maybe positioning yourself to try and stay ahead in this space too?

[00:18:13] Well, obviously, first of all, the Gen AI landscape is obviously changing very fast.

[00:18:21] Yeah.

[00:18:22] So it's hard to make predictions, but inside the data realm, I will mention a couple of things that I see happening in the next couple of years.

[00:18:36] So what are you thinking about?

[00:18:37] Well, first, starting on the present, I think today most organizations are focused on building a solid data foundation for Gen AI.

[00:18:47] And for instance, I was reading this morning a Deloitte report on the state of Gen AI in the enterprise.

[00:18:53] And in that report, 55% of organizations identified data related issues as the biggest barrier to adoption.

[00:19:03] So I think in 2024 and 2025, we will see a lot of effort in this area.

[00:19:09] And we believe the Nodo can play a significant role in addressing these challenges.

[00:19:14] Beyond that, once we have that foundation, I think the next big step for Gen AI in large enterprises, at least from the data perspective, will be using it to finally make information self-service a reality across the entire organization.

[00:19:30] Because if you think about it, and despite significant investments in the last years or even decades, I would say that self-service data access and data democratization is still relatively uncommon in big organizations.

[00:19:44] And it's often limited to power users with some type of technical knowledge, right?

[00:19:50] And Gen AI has the potential to change this.

[00:19:52] It can be the force that finally makes data democratization and data self-service a reality in big organizations.

[00:20:00] We are already starting to see examples of this, but I think we are still in the initial stages.

[00:20:06] So in the next two years, we will see, I think, how information self-service spreads in a big way across all big organizations in the world.

[00:20:18] And once again, we think that the Nodo can contribute to this by providing this Gen AI-enabled semantic layer for unified data access.

[00:20:30] And another impact of Gen AI in the data world that we have not mentioned so far is, I think, that the distinction between structured data, like the data that you have in databases or that you access through APIs,

[00:20:45] and unstructured data, meaning the data that you have in documents or in media, images or videos and so on, this distinction will start to slowly fade away.

[00:20:57] Because now these two types of data are dealt with with two totally different technology stacks that are almost completely independent, because previously there were not good ways to link the semantics of both worlds.

[00:21:12] But now with large language models, we are starting to see methods with very reasonable accuracy to bridge this gap.

[00:21:21] And I think this will also have a very strong impact in the data world that we are only starting to see.

[00:21:29] And of course, outside of the pure data realm, outside of the data domain, there are many other application areas for Gen AI in the enterprise, such as automation and optimization.

[00:21:45] Also, outside of the technical areas, we will see in the next few years companies investing in adapting to the emerging regulations around AI.

[00:21:56] And again, we think that a solid data foundation will be crucial for this.

[00:22:02] Well, thank you so much for taking the time to sit down with me, share your insights with everyone listening today.

[00:22:08] But before I let you go, I'm going to ask you to leave one final gift for our global audience.

[00:22:13] And that is, can you leave us with a book that has inspired you or mean something to you, or that you would just recommend that everyone listening can check out?

[00:22:22] And I'll add it to our Amazon wishlist so people can check out all the books that guests have recommended.

[00:22:27] But what book would you like to leave and why?

[00:22:29] Well, one of my all-time favorite books is Fictions by the Argentinian writer Jorge Luis Borges.

[00:22:38] It's a collection of 17 short stories that are incredibly inventive.

[00:22:44] They are filled with philosophical and mathematical concepts that remain inspiring and thought-provoking even today, more than 80 years after it was written.

[00:22:55] For instance, there is one story in the book called Funes de Memorios that tells the story of a man with perfect memory recall.

[00:23:06] But paradoxically, this perfect recall also limits his ability to think in abstract terms.

[00:23:14] For instance, when you read it now, it's hard not to find parallels with large-language models, right?

[00:23:21] That are also great at recalling information that they have seen in their training models,

[00:23:28] but sometimes may have limitations to use that in a more abstract way.

[00:23:34] So there are very, very surprising and strong connections between these stories that were written more than 80 years ago and many topics in technology today.

[00:23:44] And other stories, for instance, like the Library of Babel and the Lottery in Babylon are also very evocative, I think, for data and technology professionals, even today.

[00:23:56] So I think it would be a great addition to the library of the podcast.

[00:24:02] Wow, you are a class act, my friend.

[00:24:05] I will proudly add that to our Amazon wish list.

[00:24:07] And kudos to you for not adding a self-help or business book.

[00:24:12] There are so many of those out there.

[00:24:14] It's great to hear something with a touch of class.

[00:24:17] So I'll be getting that added straight to the wish list.

[00:24:20] And for anyone listening just wanting to find out more about Donodo and everything that we discussed today,

[00:24:25] anywhere you'd like to point to everyone listening who want to maybe ask a few questions or find more information?

[00:24:32] Yes.

[00:24:33] There are a lot of interesting resources related to this in our website.

[00:24:37] We have a specific section about Generative AI in our website with a lot of interesting resources, including the ones that I mentioned here.

[00:24:50] And, of course, I will be more than happy to answer any other questions or to discuss these topics with anyone interested through my LinkedIn account.

[00:25:02] You can simply search for Alberto Pan Denodo and I will be happy to discuss.

[00:25:10] Well, I'll add all the details there along with a link to your LinkedIn profile so people can connect with you nice and easily.

[00:25:17] And I think when talking about things like integration with Amazon Bedrock, large language models, it can be quite daunting and complex.

[00:25:25] But you've put it in language that everyone can understand here.

[00:25:28] And I'm confident that both techies and business leaders will have some powerful takeaways of how this integration with Amazon Bedrock is enabling enterprise Gen AI applications to deliver more accurate,

[00:25:43] more tailored responses to customer prompts, not to mention streamlining the development of powerful new Gen AI enterprise applications with security, privacy and responsible AI in mind.

[00:25:54] And just a few takeaways for myself.

[00:25:56] But thank you for shining a light on this today.

[00:25:59] Much appreciated.

[00:26:00] Thank you.

[00:26:00] Thank you, everyone, for listening.

[00:26:01] And thank you, Neil, for having me.

[00:26:03] Big thank you to Alberto there for sharing his insights about how Denodo's integration with Amazon Bedrock and its impact on Gen AI applications right across the enterprise sphere there.

[00:26:15] And it's clear that the path towards a more responsive, tailored and ethically conscious AI driven applications is being shaped by these advanced technologies.

[00:26:26] And listeners, what are your thoughts on the future of Gen AI in businesses?

[00:26:31] Have you already encountered any applications that have helped transform your experience?

[00:26:36] Or are you still in the very early stages?

[00:26:38] Because I seem to get a lot of varying opinions on this.

[00:26:42] And I'd love for you to join the conversation.

[00:26:45] And you can do that by emailing me, techblogwriteroutlook.com, LinkedIn, X, Instagram, just at Neil C. Hughes.

[00:26:53] Love to hear from you.

[00:26:54] But I'm afraid we're out of time for today.

[00:26:56] I'll be back again tomorrow with another guest.

[00:26:58] But thank you for listening as always.

[00:27:00] And hopefully I will get to speak with you all again bright and early tomorrow.

[00:27:04] Bye for now.