How Precisely Is Closing the AI Data Integrity Gap
Tech Talks DailyJune 24, 2026
3616
26:0018.41 MB

How Precisely Is Closing the AI Data Integrity Gap

Can organizations really call themselves AI-ready if their data foundations still have gaps?

In this episode of Tech Talks Daily, I sit down with Dave Shuman, Chief Data Officer at Precisely, to discuss the findings from the company's latest State of Data Integrity and AI Readiness Report. Drawing on insights from more than 500 senior IT leaders across the US and Europe, Dave explains why many organizations are confident in their AI readiness while simultaneously identifying infrastructure, data quality, and governance as their biggest obstacles.

Our conversation focuses on what Dave describes as the AI data integrity gap, the growing disconnect between ambitious AI initiatives and the quality, consistency, and context of the data powering them. We explore why successful AI projects often perform well in controlled pilot environments before struggling when deployed at scale, and why many organizations continue to underestimate the importance of data lineage, semantic layers, governance, and observability.

Dave also shares why he believes data governance and AI governance should be treated as a single discipline rather than separate initiatives. We discuss how businesses can move beyond vanity metrics such as token usage and agent counts to focus on outcomes that genuinely matter, including revenue growth, cost reduction, customer experience, and risk management.

As the conversation turns to the future of agentic AI, Dave offers a practical perspective on what autonomous systems will require of organizations and why trust in data will become increasingly important as AI assumes greater responsibility behind the scenes.

If your organization is investing heavily in AI and looking for measurable business value, this episode offers a timely reminder that successful AI strategies begin long before the first model is deployed. They begin with data integrity. Based on Precisely's latest research, Dave explains why companies making progress are focusing less on the latest AI tools and more on laying the foundations that enable those tools to deliver reliable outcomes.

What role does data integrity play in your organization's AI strategy, and are you confident your data is truly AI-ready?

Useful Links

[00:00:00] - [Speaker 0]
The leading issue of agentic AI in businesses right now is ensuring agents act with compliance guidelines, and Denodo applies guardrails across your entire data estate. By aligning your company's data infrastructure under one system, these guardrails perform consistently across your platform. So start scaling your business and start with Denodo. Simply visit denodo.com to learn more. What if the biggest barrier to AI success has less to do with the model that we're all obsessing over and far more to do with the data that is feeding that model.

[00:00:43] - [Speaker 0]
Well, my guest today is Dave Schuman. He's the chief data officer at a company called Precisely, and they are an organization focused on data integrity and helping organizations make better decisions from data that they can trust. And my guest describes himself as a data whisperer, which feels incredibly fitting for a conversation that goes beneath the surface of AI readiness. Because while many leaders are under pressure right now to show fast returns from those AI projects, the reality inside many organizations is sadly far messier. Platforms have been brought, pilots have been launched, and confidence is being projected.

[00:01:26] - [Speaker 0]
But the data foundations, these are often still full of gaps. So in today's episode, we will discuss Precisely's latest state of data integrity and AI readiness report. There are so many big stats in there, but one of the standout findings is the disconnect between perception and reality. Many leaders say they're all ready for AI, but their infrastructure, their skills, their governance, and data quality, all these things remain persistent obstacles. So Dave will explain today why successful AI depends on quality governance, enrichment, context, and semantic layers, And it also share why pilot projects often look impressive in those nice and safe control environments, but they always struggle when they move into production where agents face broader datasets, real users, and far fewer guardrails.

[00:02:22] - [Speaker 0]
So we'll talk about all this ROI, why many organizations are measuring the wrong thing, and why AI value must be tied back to revenue, cost, service, and risk. So if autonomous AI amplifies whatever data is given, whether it be good or bad, is your organization spending enough time fixing the foundation before thinking about asking AI to make decisions? Well, enough for me. It's time to bring Dave onto the podcast now, and I cordially invite you to listen in with me. So thank you for joining me on the podcast today.

[00:03:01] - [Speaker 0]
For everyone listening, could we begin by just telling them a little about who you are and what you do?

[00:03:06] - [Speaker 1]
Neil, thanks for having me on. My name is Dave Shuman. I'm the chief data officer at Precisely. And you say, what is Precisely? Precisely is a global leader in data integrity.

[00:03:16] - [Speaker 1]
We help organizations ensure their data is accurate, consistent, and contextual so they can trust their data to make better decisions. At Precisely, I lead our data strategy, which is governance, analytics, now AI enablement, and the data engineering and architecture teams that send underneath it all. I've spent my career listening to what the data is and isn't saying and the signals below the surface. And if I had to go back and boil that down to one pithy little statement, I would call myself a data whisperer.

[00:03:46] - [Speaker 0]
Oh, I like that. The data whisperer. That's got a real ring to it, hasn't it?

[00:03:51] - [Speaker 1]
It also not a horse whisperer. I have proven that emphatically, but I'll I'll go with data whisperer.

[00:03:57] - [Speaker 0]
Okay. Sounds like there's another podcast episode right there. But one of the reasons that, put me put you on my radar and why I was excited to speak with you was having flicked through the recent state of the data integrity and AI readiness report. And, again, for people hearing about that that for the first time, tell them a a little bit about what it is and maybe summarize some of the key findings in there because there's some pretty big stats, isn't there?

[00:04:21] - [Speaker 1]
It is. So we did the study in conjunction with the Drexel LeBeau College, and one of the interesting things surveyed over 500 IT professionals and mostly in senior leadership roles in The US and Europe. And really, the question we came back to and what the headline for me that was around this was a disconnect, between how leaders feel and how ready their data actually is. And so it's think that's a I called it the AI data integrity gap. It's where AI value gets stuck.

[00:04:57] - [Speaker 1]
And so what we saw in this is that AI ready is often interpreted as we bought the platforms, we ran the pilots, but the real readiness is operational capabilities, ownership, lineage, metadata, life cycle management, monitoring, and KPI linked outcomes. And so 87% of the respondents said their infrastructure is ready, but half still cited infrastructure as a key challenge. And so that really tells us the gap isn't hardware or software. It was a full stack capability problem. The the skill distribution showed balance shortages across all capabilities.

[00:05:35] - [Speaker 1]
So there's really no single hire or role that would fix this.

[00:05:39] - [Speaker 0]
And if we just double click there on that disconnect between leaders' perception of AI readiness and the reality of some of the obstacles in the way, I'm curious. From your perspective, what are organizations underestimating most when it when it comes to becoming AI ready? What are they forgetting or underestimating?

[00:05:56] - [Speaker 1]
I think that the perception is that AI is about the tooling and how we can build our agents and how we can transform the organization around them. And I think the real disconnect is that the foundation isn't there. It's it's it's things that I call below the waterline effort. So that's lineage. It's the data definitions.

[00:06:17] - [Speaker 1]
It's profiling. It's enrichment. And that's quiet and tedious work, but that's what makes AI trustworthy. And often, it has not been funded in these organizations. It's been seen as something that as we do a large acquisition or we build out a new set of systems, we come to the sort of technical debt that we leave, in the organization and say, we we can fix that fix that in post.

[00:06:42] - [Speaker 1]
And post is here. The time is now because AI is now looking at this data. It's finding data that's in our graph. It's going in and, you know, accessing data that's in our systems like Snowflake or Databricks and coming back. And we're like, well, where did that come from?

[00:07:00] - [Speaker 1]
Where are these insights? You know, where where did you find that data? And that's really getting back to the fundamentals that data preparation and data quality leads to data integrity.

[00:07:12] - [Speaker 0]
And what would you say the biggest obstacle is to AI success right now? And do you see this changing in the immediate future if as we look towards the end of 2026, '27, and and beyond? Do you see this changing, and what is that big obstacle?

[00:07:27] - [Speaker 1]
I I think it it it truly is down to the data quality and the sort of the building out of high quality data with a strong semantic layer. So what I've been seeing is we do a lot of these pilots now. We we build out. We have a hypothesis. We're gonna go test this out.

[00:07:42] - [Speaker 1]
We have a tool, that we're gonna work with, and we we build this well contained pilot or proof of concept that that sits within its walled garden. We curate the data for it very carefully. It's highly observed by all the participants that are in the process, and it's deemed like this is a successful pilot or ready to go to production. And production means setting it up in the wild. Now it doesn't have that continuous observation that was going on during the pilot phase.

[00:08:14] - [Speaker 1]
It's needs to move more into its autonomous nature, and all the tooling isn't there to to keep it within its bounds. And so we'll come back to it and say, oh, well, now I'm getting disappointing results, and we started to poke into that. And it was now on a broader set of data or on different set of users who are coming at the model, you know, 90 degrees off of where we thought we were coding for, and our agents are suddenly returning very different results. And so I think from an organizational perspective, it's an investment in the the maintenance of a model once it goes an agent once it goes into production, and the the diligence it takes to build and curate the dataset in context for it.

[00:09:04] - [Speaker 0]
And for people listening, and your words are really resonating with them right now, how can organizations address the critical data integrity gaps that we're talking about today? Because they they keep persisting. I I suspect we've got people around the world nodding nodding in agreement here, but what should they be doing to address this?

[00:09:21] - [Speaker 1]
I think of this like a a layer cake. You know, we we start off with our source systems that are emitting the data that we wanna be using. Often in this day and age, it's now SaaS applications. It used to be that we'd have on prem and cloud that we were working with. But the each of those is building its own sort of data silo.

[00:09:40] - [Speaker 1]
So first of all, we have to have a plan on how do we create a cohesive data fabric that we have system one and system two interchange their identities so that I can watch a transaction flow from one to the other and see the the the sort of totality of that record. So we've start with within precisely, we start within those source systems. We use a methodology for called ELT, extract, load, transform. For me, that was a a really transformational change in the industry when we went from ETL to ELT. It meant that we could get the data in its native form and transform it after the fact and make it fit for purpose.

[00:10:24] - [Speaker 1]
We do those typically in zones. We'll land it raw. We'll build out our cohesive data, component data within there. And at the end of that, we build out the sort of golden layer. This is where our data products live.

[00:10:37] - [Speaker 1]
And a data product is is really fit for purpose. How do we look at our opportunity and pipeline data? How do you look at the bookings? How do you look at the, customer engagement? Each of those becomes a data product that we then create its own data catalog on.

[00:10:53] - [Speaker 1]
And the catalog helps document what the assumptions were, what the meaning, what the origins, the lineage of the data that's in the catalog. And that's kind of where we had gotten to when we're building out for BI. As we move into AI, the missing layer on there was including a semantic layer, and that's really where I see us building now. If I'd say, where are we if you look at the map of the world right now? We are at building out semantic layers so that our agents have context to be able to apply this consistently so that we can get to autonomous agents.

[00:11:30] - [Speaker 1]
And so without that context layer, without those semantics built in on top of this catalog, we're gonna have agents that are operating autonomously but are using their own intuitive nature of what they think the data should be rather than the context that we're applying to it for the organization.

[00:11:49] - [Speaker 0]
And as a data whisperer, with so much great experience in this field, I'm curious. Was there anything in the report that you found surprising that caught you off guard that you didn't expect to see? Because you've you've probably seen so many trends over the years, but anything particularly surprised you this time?

[00:12:06] - [Speaker 1]
I did think that the the disconnect between sort of the posturing, the the when we were asked the organizations, are you AI ready? 87% came back and said, we are AI ready, And we have all the tools and we have the infrastructure. And then we turn that question right back around and say, what's your biggest barrier? And they say, well, it's the tools. It's the infrastructure.

[00:12:28] - [Speaker 1]
So there's this confidence that we're projecting to the to our external entities. But when we come back to when we say, how do we get down and execute on it? That's where we see that that sort of data integrity gap. The other thing I think that was, I think, heartening for me out of the study was the focus in on organizations that had existing data governance programs and folded AI governance into data governance were far more successful than those who are building out separate data governance programs. And so I think there's a real lesson for this is that data governance and AI governance are not two separate entities out there.

[00:13:10] - [Speaker 1]
They're really part of the overall use of data and how we make that autonomous, and those two programs should fold together.

[00:13:19] - [Speaker 0]
And many leaders listening now naturally expect to see fast ROI from those AI projects there, and that tech does love a good acronym. There's two right there. But, I mean, relatively few organizations have clear metrics that are tied to business KPIs. So why do you think measuring AI value is proving so challenging for some organizations? And how are they currently approaching this wrong?

[00:13:43] - [Speaker 0]
Are they not measuring the right things? What's happening here?

[00:13:46] - [Speaker 1]
I I think there's a there's truly a disconnect because many of the metrics that I see organizations, you know, discussing are volume metrics. How many queries did we run? How many new agents did we release? How many tokens did we use? I I can't believe that token maxing is a term that we're we're thinking about as as a measure of success.

[00:14:08] - [Speaker 1]
And it's not tied back to the, you know, the four fundamentals. You know? Is it improving our revenue? Is it decreasing cost? Is it improving service?

[00:14:16] - [Speaker 1]
Is it decreasing risk? And by tying those back to the actual outcome metrics, that's where we're looking at it at sort of that maturity gap. And I'm not seeing it. I they they we're designing these sort of agents. We're throwing them out there, and they're not tied back to real business metrics that make a difference.

[00:14:37] - [Speaker 0]
I guess the big question, which is almost a podcast episode entirely on its own, and it's impossible to answer answer properly, I guess. But how can a business get the best out of those AI investments, and how can they ensure that AI delivers that measurable value?

[00:14:53] - [Speaker 1]
I think it's going back, and and it's cocreating. So I it when you have your business leaders and your technology leaders working together to cocreate, you start with what is the business outcome that we wanna achieve and designing the entirety of that agent and that that experience to how do we prove that we did what we said we were going to do. And that's, you know, making ensuring that we have the right inputs into the into the agent, that we're collecting the right components on there, and that we build things like observability into our agents that allow us to measure both the relevance and coherence and correctness of the agent itself, but the outcomes that come out of that. And that is, I think, a step that we're missing in the process right now. We're very focused in on how we get this agent to achieve what it did and not how do we collect the data that's gonna allow us to prove that it's operating with integrity, that it's it's achieving the outcomes that we're looking to, that we're looking to achieve with the model.

[00:15:56] - [Speaker 1]
And that's that it's a missed step in the process. I think we're trying to get to the the flashy outcome without all the diligent work along the way.

[00:16:07] - [Speaker 0]
And I guess it's slightly unsurprising that organizations with strong data governance, they're the ones that are reporting higher trust in their data. But what what is the real world impact of that trust when it comes to AI outcomes, and how do you think a a gentic AI will affect this? Because, again, big topic this year.

[00:16:25] - [Speaker 1]
Yeah. I I think we're in you know, to to go back to the the early Internet years, we're in the one dot o experience right now. We're building summarization agents. It's sort of very human oriented, human initiated, AI. Where this is going to go is going to be, autonomous AI.

[00:16:44] - [Speaker 1]
And so I almost think of this as we're gonna build out an agent as an employee. We're going to give it a job. We're going to, you know, give it supervision over that, but expect that to to work, behind the scenes and report back when it needs help. You know? So that pivots our our our view of how we're going to engage with AI from this sort of summarization and contextualization agents to, you know, the autonomous agents operating in the background and us being able to confidently observe their outcomes.

[00:17:21] - [Speaker 1]
And that the tech's not there yet. The stack doesn't exist that allows us to govern and manage and observe. We're so focused in on the build and the agent component right now.

[00:17:34] - [Speaker 0]
And in a world where many leaders and indeed workers and as consumers, we're bombarded with so much information. It can feel, incredibly overwhelming. So I always try and give everyone listening a few valuable takeaways. So what would you say are a few governance best practices that organizations must be prioritizing right now? Because it's easy to get distracted by the the shiny big features and things.

[00:17:56] - [Speaker 0]
But what should they be prioritizing now in in terms of data governance, do you think?

[00:18:00] - [Speaker 1]
Yeah. Here's what I tell my team. Governance has to produce automation, not documents. Policies don't scale AI. Automated controls, quality checks, they're embedded in the pipelines, access rules, lineage tracking, privacy enforcement baked in, CICD.

[00:18:19] - [Speaker 1]
Yeah. Like, this this whole concept around MLOps. That's what actually moves the needle. And so when I see a governance program that's focused in on you know, we have a 100 page document all about our AI policies in there, like, that's gotta be baked into the code. When governance is in code, it accelerates that innovation rather than slowing it down.

[00:18:40] - [Speaker 1]
And I think that's the mindset shift that we need to to make.

[00:18:44] - [Speaker 0]
And I think over the last twelve months, we've all heard the phrase no data, no AI. So how can organizations better ensure that they provide that highest quality data for their AI implementations? Any tips there?

[00:18:57] - [Speaker 1]
I think you have to look at our landscapes. And most of us have a a fairly complicated data landscape that's out there, multiple tools that are, specific for the for the functional purpose, SaaS applications that that that are, you know, functionally isolated, but that we need the the overall visibility to. So I would look the first step in that is how do you build that sort of cohesive data landscape where your apps are are performing the tasks that they need to, but you're building that that sort of data catalog off of that. If you haven't built out that inventory, if you haven't looked at how the systems are identifying the same entity and building that sort of that cross reference between those, if you haven't built that into your integration, that's really an area that you need to start because most of the datasets that we wanna work with are not singular in their nature. I call it the power of and.

[00:19:54] - [Speaker 1]
We wanna combine the data from our CRM system with the data from our ERP system. And so the two of those together means that we have to have a complete exchange of identities between the systems managed in a consistent fashion. That sort of data catalog and building out the the fundamental architecture allows you to get to the next step. For a lot of the agents that we're working with today, they're actually working off established data products from our data catalog rather than with the native data that sits in the application. So I get I hear a lot of pressure now that, you know, can't you just spin up an MCP server?

[00:20:31] - [Speaker 1]
Let me get access directly into the CRM data or I wanna get into the ERP data, and I'm just gonna just gonna go at it. And what are we we focus on is building out that catalog experience so that we're already creating the cohesive denormalized view of those two things with enrichment in it, with governance baked into it, and observability so that we can see where the data is aligned. And it allows the agent to work with clean data. And so I mark up an MCP server, go at the system, view of the world is, you know, kind of, shortsighted.

[00:21:12] - [Speaker 0]
And we've we'll have people listening that subscribe to tech newsletters, listen to podcasts, continuously involved in forums and tech discussions. And as I said, it can get incredibly overwhelming with so much information. So if anyone listening from an organization that wants to make big changes, if if they could remember one big takeaway from our discussion today and indeed the findings from your report, what what do you think that should be?

[00:21:38] - [Speaker 1]
Data integrity comes first. That's, autonomous AI amplifies whatever data it's fed, good or bad. So you have to start your AI programs with understanding in your data landscape where you have quality curated data in context and already your semantic layer to make that consistent. If you don't have that, you're going to have, poor results from your agentic experience. Other thing I would say is define clear ownership.

[00:22:08] - [Speaker 1]
Assign responsibility for the data inputs, the model behavior, the oversight, and build those guardrails into your your data products. They have to scale with the new regulations and with the fast changing use cases. So I think from an organizational perspective, you have to balance speed with control. We're constantly getting bombarded with new components, new capabilities. I I I used to say that six months ago, organization was learning to spell AI.

[00:22:37] - [Speaker 1]
Now every single one of my users has access to, whether it's Copilot or whether it's Claude or other models, and they're looking to unlock the value out of that. And I think that as we move with speed and we're exploring into these new case these use cases, we have to ensure that what we're doing has those safeguards to prevent bias and to protect us from risk.

[00:23:04] - [Speaker 0]
Well, thank you so much for sitting down with me today. For everybody listening, I'm gonna include a link in the show notes to the 2026 state of data integrity and AI readiness report. I do urge people listening to go check that out. There are so many big stats in there. It's a great read.

[00:23:20] - [Speaker 0]
I'll also include a link to you to your LinkedIn and indeed the Precisely website, but more than anything, just thank you for joining me today and bringing all this to life. Thank you so much.

[00:23:29] - [Speaker 1]
Neil, I appreciate it. Thanks for having me on.

[00:23:32] - [Speaker 0]
I think the big takeaway here is that AI success begins long before anyone launches an agent or tests a new model. Because Dave brought the conversation right back to the work that many organizations would rather avoid. Data definitions, profiling, enrichment, governance, semantic layers, observability, all these things, they don't they don't make the headlines, but they do ultimately decide whether AI can be trusted in the real world. And I also liked his point about governance needing to produce automation rather than just documents, Because a 100 page policy will not scale with autonomous AI, and controls need to be built into the pipelines. Quality checks, access rules, and privacy enforcement, and indeed the way that data products are created and monitored.

[00:24:24] - [Speaker 0]
So for business leaders listening today, there is a clear warning here. Measuring token usage, query volume, or the number of agents launched, all these things sound productive, But those numbers mean very little unless they connect to real business outcomes. So over to you. Is AI improving your revenue? Is it reducing cost, improving service, or reducing risk?

[00:24:49] - [Speaker 0]
This is where the conversation has to go. Dave's message today is quite clear. Data integrity comes first. Autonomous AI will amplify the data it receives, and that ultimately means data will create poor outcomes faster than ever before. Or to put it more bluntly, yep, garbage in, garbage out.

[00:25:11] - [Speaker 0]
So I'll include links to precisely Dave's LinkedIn profile and the 2026 state of data integrity and AI readiness report in the show notes, and you can find a blog post associated with this episode over at techtalksnetwork.com. So, please, I invite you to let me know what you think. Are you and your business moving too quickly with AI agents before they truly understand before truly understanding the quality and context of the data behind them? How are you getting around this? What are you doing?

[00:25:41] - [Speaker 0]
Let me know. And while you marinate on that, I'm gonna walk off into the sunset, but I will return again bright and early tomorrow. Thanks for listening. Speak to you then. Bye for now.