AI - Beyond the Hype
AI - Beyond the Hype is a podcast for senior executives, technology leaders, and data professionals who want a clear-eyed view of what it really takes to make AI work in the enterprise.
Each short episode is designed for easy consumption by busy leaders and executives, offering concise, practical conversations on the foundations behind successful AI adoption — from data quality and observability to governance, operating models, architecture, and trust. Through thoughtful, conversational dialogue, the show connects executive priorities with the technical realities that determine whether AI delivers meaningful value or simply creates more noise.
If your organisation is asking big questions about AI readiness, digital transformation, and data-driven decision-making, this podcast is designed to help you quickly separate what sounds impressive from what actually works.
AI - Beyond the Hype
The Invisible Architecture: Why Data Modelling Is the Make-or-Break for Enterprise AI
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Sarah and James unpack a question most AI programmes never ask early enough: is the data actually modelled? Drawing on recent benchmarks, documented enterprise failures, and hard ROI evidence, they explore why AI accuracy drops to zero without proper data foundations, why 80% of AI projects stall on data — not algorithms — and what leaders can do about it. From the London Whale to Walmart's checkout fiasco, this episode puts data modelling in the language of business risk, competitive advantage, and AI readiness.
References:
- A Benchmark to Understand the Role of Knowledge Graphs on Large Language Model's Accuracy for Question Answering on Enterprise SQL Databases
https://arxiv.org/abs/2311.07509 - The Consequences of Poor Data Quality: Uncovering the Hidden Risks
https://www.actian.com/blog/data-management/the-costly-consequences-of-poor-data-quality/ - The Root Causes of Failure for Artificial Intelligence Projects and How They Can Succeed
https://www.rand.org/content/dam/rand/pubs/research_reports/RRA2600/RRA2680-1/RAND_RRA2680-1.pdf - Generative AI Benchmark: Increasing the Accuracy of LLMs ...
https://data.world/blog/generative-ai-benchmark-increasing-the-accuracy-of-llms-in-the-enterprise-with-a-knowledge-graph/ - How a Single Source of Truth for Data Unlocks Growth ...
https://vizule.io/single-source-of-truth-data/ - Is a Semantic Layer Necessary for Enterprise-Grade AI Agents?
https://www.tellius.com/resources/blog/is-a-semantic-layer-necessary-for-enterprise-grade-ai-agents - The Consequences of Poor Data Quality: Uncovering the Hidden Risks
https://www.actian.com/blog/data-management/the-costly-consequences-of-poor-data-quality/ - The Impact of Poor Data Quality (and How to Fix It)
https://www.dataversity.net/articles/the-impact-of-poor-data-quality-and-how-to-fix-it/ - Impact of Poor Data Quality on Business Performance: Challenges, Costs, and Solutions
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4843991 - The ROI of Data Modeling ...
https://sqldbm.com/blog/the-roi-of-data-modeling-speaking-to-the-c-suite-using-business-metrics/ - Master Data Management Case Study: Luxury Retail Transformation
https://flevy.com/topic/master-data-management/case-master-data-management-enhancement-luxury-retail - MDM case study: The value of the Golden Record and mastering your data
https://qmetrix.com.au/case-study/mdm-case-study-the-value-of-the-golden-record-and-mastering-your-data/ - JPMorgan Chase London Whale C: Risk Limits, Metrics, and Models
https://elisch
Welcome back to the show. I'm James.
SPEAKER_00And I'm Sarah.
SPEAKER_02Today we're going to talk about something that, I'll be honest, does not sound exciting at first. Data modeling. And I can already feel half the audience reaching for the skip button.
SPEAKER_00I know. I know. It sounds like plumbing.
SPEAKER_02It is plumbing.
SPEAKER_00Yes. But here's the thing: it's the plumbing that determines whether your AI actually works or just looks like it works.
SPEAKER_02And that distinction matters a lot more than most leaders realize. So today we're going to walk through some research that honestly, when I first saw the numbers, I had to read them twice. Because the gap between having a proper data model and not having one isn't a marginal improvement. It's the difference between something functioning and something failing entirely.
SPEAKER_00Completely. And we should say up front, this episode isn't about convincing data engineers that data modeling matters. They already know. This is really for the business leaders, the executive sponsors, the people making investment decisions about AI programs, who might be wondering why their data team keeps asking for time and budget to do something that sounds abstract.
SPEAKER_02Right, we need to model the data sounds a lot less urgent than we need to launch the AI.
SPEAKER_00Exactly. But the research tells a very different story.
SPEAKER_02So let's start with the headline number because this is the one that stopped me cold. There's a benchmark study, Sequada and others, published through Archive, that tested how well GPT-4 answers enterprise questions, the kinds of questions a business leader would actually care about. Revenue questions, KPI questions, strategy questions. And when the AI was pointed at raw database schemas with no data model, no knowledge graph, no semantic layer, it scored 0% on KPI and strategy questions. Not low, zero.
SPEAKER_00Zero. And I think that's worth sitting with for a moment. Because this isn't some obscure edge case. These are the questions that executives are most likely to ask an AI system. What's our customer retention rate by segment? What's the revenue trend for this product line? The things the AI program was probably funded to answer.
SPEAKER_02And with no model underneath, the AI literally cannot answer them.
SPEAKER_00Right. And when they added a knowledge graph, which is essentially a formal data model that maps out entities, relationships, and business definitions, overall accuracy went from 16% to 54%. And on those KPI questions specifically, it went from zero to about 36%.
SPEAKER_02Now, 36% still isn't perfect.
SPEAKER_00No, it's not. But it's infinitely better than zero. Literally infinitely better. And the point isn't that a knowledge graph solves everything overnight. The point is that without the modeling underneath, the AI has nothing to work with. It's guessing. And it's guessing badly.
SPEAKER_02So let me translate that for the boardroom. If you've approved an AI program and you haven't invested in the data model, you've essentially bought a very expensive system that cannot answer the questions you funded it to answer.
SPEAKER_00That's exactly it.
SPEAKER_02And this connects to a broader pattern, doesn't it? The failure rate across AI projects.
SPEAKER_00It does. The industry data, and this comes from Rand Corporation and multiple industry surveys, shows that roughly 80% of AI projects either stall or fail outright. And the primary cause isn't the algorithms, it's not the models, it's the data.
SPEAKER_0280% failure. And the root cause is data, not AI.
SPEAKER_00Every time. And I think there's a really important distinction here that gets lost in the conversation. When people say data quality, executives sometimes hear, oh, we need to clean some spreadsheets. But what we're actually talking about is structural. It's about whether the data has been modeled, whether there are agreed definitions, consistent entities, traceable lineage from source to report. Without that structure, you're not cleaning data. You're just rearranging chaos.
SPEAKER_02I love that. Rearranging chaos. That's what a lot of organizations are doing, isn't it? They've got data teams working incredibly hard, but without the structural foundation, the work doesn't compound. It just repeats.
SPEAKER_00Exactly. And this is where it connects to something I get weirdly excited about, the semantic layer. So a semantic layer is basically a governed translation layer that sits between your raw data and everything that consumes it. BI tools, dashboards, AI agents. And it says, here is what revenue means, here is what customer means, here is the one authorized calculation for churn rate. Without it, every team, every tool, every AI agent is interpreting those terms differently.
SPEAKER_01And that's the 17 definitions of customer problem.
SPEAKER_00That's exactly it. No exaggeration. It's been documented in large enterprises. 17 distinct data representations for the same concept of customer. And when your AI agent hits that, it doesn't know which one to use. So it picks one. And it might be wrong. And you don't know it's wrong because the answer looks plausible.
SPEAKER_02That's the dangerous bit. The AI doesn't come back and say, I'm confused. It just confidently gives you the wrong answer.
SPEAKER_00Confidently and fluently. That's what makes this so risky. A human analyst might flag the inconsistency. An AI just picks a path and runs with it.
SPEAKER_02So what does the semantic layer actually do to fix this?
SPEAKER_00The benchmarks from AtScale and DBT labs show that when you put a proper semantic layer in place, where metrics are defined once and governed, AI-generated answers on natural language queries hit about 83% accuracy. Without it, the same queries either fail or produce inconsistent results depending on which table the AI happens to query first.
SPEAKER_0283% versus essentially random. That's a four to five times improvement.
SPEAKER_00It is. And the thing to understand is that building that semantic layer isn't some exotic new technology initiative. It's data modeling. It's doing the work to define what your entities and metrics actually mean, and then encoding that in a way the machines can use.
SPEAKER_02Alright, so let's talk about money because this is where I think a lot of business leaders start paying attention. What does the research say about the financial impact?
SPEAKER_00So, Gartner's research puts the average cost of poor data quality at about$15 million per organization per year. And at the macro level, the US economy alone absorbs roughly$3.1 trillion annually from bad data. For individual organizations, studies show data quality losses equivalent to 8-12% of annual revenue.
SPEAKER_028-12% of revenue. That's not a rounding error.
SPEAKER_00No. And that 15 million per year figure? That's things like rework in development, failed analytics projects, operational errors, missed sales opportunities. Gartner also found that organizations are missing about 45% of potential sales leads because of data issues. And employees, this one's striking, spend up to 27% of their time just correcting bad data. Not analysing it, not deriving insight from it, correcting it.
SPEAKER_02So more than a quarter of your workforce's analytical time is spent fixing data rather than using it. That's a productivity problem hiding inside a data problem.
SPEAKER_00That's exactly what it is.
SPEAKER_02And on the flip side, what's the return on actually doing the modelling?
SPEAKER_00The ROI analysis is really clear. Every dollar spent on data modelling returns approximately 30%. And projects that include formal data modelling deliver about 5% higher overall project ROI compared to those that skip it. Which might sound modest, but on a multi-million dollar program, 5% adds up fast.
SPEAKER_02It does. And I think the way to frame this for leaders is this isn't a cost center. This is an investment that has a measurable, documented return. And it's one of the few investments that also de-risks all your other data and AI investments.
SPEAKER_00That's a really good way to put it. The modeling isn't just valuable in itself, it makes everything built on top of it more likely to succeed.
SPEAKER_02Now I want to go to some real-world failures, because I think the abstract numbers are compelling, but the stories tend to be what makes it real for people. The JP Morgan London Whale case is probably the most famous.
SPEAKER_00It is, and it's a textbook example. In 2012, JP Morgan Chase's chief investment office had a trading risk model, the value at risk model, built on Excel spreadsheets with manual copy-paste data entry. No formal data model, no lineage controls, no way to trace where the inputs were coming from, or to validate the calculations end to end. A formula error caused what was actually catastrophic risk exposure to display as a minor fluctuation. The result was$6.2 billion in losses.
SPEAKER_02$6.2 billion? From a data modeling problem?
SPEAKER_00From a data modelling problem. And the post-incident analysis was explicit. It called for integration into a single, auditable data model as the foundational remedy. Not better traders, not better algorithms, a data model.
SPEAKER_02And this connects directly to regulatory requirements now, doesn't it? BCBS 239.
SPEAKER_00It does. BCBS 239. The Basel Committee's 14 principles for risk data aggregation is mandatory for all global systemically important banks, and it explicitly requires end-to-end data lineage as a foundational capability. You can't do lineage without a data model underneath. There's nothing to trace if you haven't defined the entities, the relationships, the flows. Organizations implementing proper lineage on top of formal data models are seeing 58% reduction in regulatory fine risk, 34% faster resolution of data quality issues, and 57% faster audit response times.
SPEAKER_02Those are not marginal improvements. That's a compliance team's dream.
SPEAKER_00And it's not just financial services. BP, for example, implemented Databricks Unity Catalogue with column-level lineage across 270 workspaces serving over 10,000 users. That eliminated fragmented governance, improved discoverability, and gave them audit trails that simplified regulatory interactions across the entire organization.
SPEAKER_02So this isn't theoretical. These are large, complex organizations making this work at scale.
SPEAKER_00At real scale. Yes.
SPEAKER_02Let's come back to AI failures because I think there are some more recent examples that really drive the point home.
SPEAKER_00There are, and they span different industries. Unity Technologies in 2022. Corrupted data was ingested without validation and it destroyed their AI ad targeting model. The result was$110 million in losses and a 37% share price drop. No data lineage, no ingestion validation. The bad data just flowed straight through into the AI.
SPEAKER_02$110 million and a third of the share price. From one data quality failure.
SPEAKER_00Then there's IBM Watson for oncology.$62 million investment. The training data contained hypothetical cases instead of real patient data. The AI ended up producing erroneous and potentially life-threatening treatment recommendations. The root cause? The data used for training wasn't validated against a real-world ontology. There was no formal model of what good clinical data should look like.
SPEAKER_02And more recently, Walmart.
SPEAKER_00Walmart's AI instant checkout program with OpenAI, they catalogued 200,000 products. But only about 30 of them, 30, were actually transactable by the AI agent. Because the product data had incomplete attributes, inaccurate taxonomy, and unstructured metadata. The AI literally couldn't work with 99.985% of the catalogue.
SPEAKER_02That is spectacularly bad.
SPEAKER_00And it's not because the AI model was bad. The AI was fine. The data model, or rather the absence of one, was the problem. Without a proper dimensional product model with governed attributes and a clean taxonomy, the AI had nothing structured to work with.
SPEAKER_02So this is the pattern. The AI doesn't fail because the technology isn't good enough. The AI fails because the data underneath isn't modeled.
SPEAKER_00That's the consistent finding. And there's academic backing for this too. Scully and colleagues at Google published a seminal paper at NURIPS, Hidden Technical Debt in Machine Learning Systems, showing that ML systems carry massive invisible maintenance costs, and those costs are primarily driven by data dependencies. Tight coupling to poorly governed data sources creates boundary erosion, hidden feedback loops, undeclared consumers. ML projects carry twice the technical debt of non-ML projects, and the debt is predominantly in data pre-processing.
SPEAKER_02Twice the technical debt. And it's in the data, not the models.
SPEAKER_00Almost entirely in the data.
SPEAKER_02Alright, I want to zoom out to the strategic picture for a moment because I think we've made the case on risk and cost. But what about competitive advantage?
SPEAKER_00The research shows that 43% of companies report gaining competitive advantage specifically from model-driven analytics. And the MIT Sloan research found that companies experienced in analytics use are actively widening their competitive gap over those that are not. And critically, management support for analytics, including executive sponsors and top-down mandates, is what separates the leaders from everyone else.
SPEAKER_02So the competitive gap is growing, not shrinking.
SPEAKER_00It's growing. The companies that invested in foundations early are now compounding that advantage. Because every new AI use case, every new analytics initiative, every new data product builds on a solid base. The companies that skipped that investment are spending their time and money on rework, reconciliation, and firefighting.
SPEAKER_02That's where this gets real for boards. This isn't just about avoiding failures. It's about the cumulative strategic disadvantage of not investing in foundations. Every quarter you delay, the gap widens.
SPEAKER_00And it gets harder to close because the complexity compounds. Every new system, every new data source, every new AI initiative that gets built without a model underneath adds to the entropy. It doesn't just stand still, it gets worse.
SPEAKER_02Let's talk about what a leader should actually do with this. Because I think we've painted a fairly stark picture, and I don't want anyone walking away feeling paralyzed. What's the practical path forward?
SPEAKER_00I think there are a few things. First, if you're sponsoring an AI program, make data modeling a prerequisite gate. Not a nice to have, not a phase two, a gate. Before you train a model, before you deploy an agent, have your data team confirm that the entities, the metrics, and the lineage are defined and governed.
SPEAKER_02So it's a readiness check, essentially. Don't confuse buying AI with being ready for AI.
SPEAKER_00Exactly. Second, invest in a semantic layer. It doesn't have to be a massive program. Start with your top 10 or 20 business metrics, define them once, govern them, make them available to every tool and every AI system through a single interface. That alone will dramatically improve AI accuracy.
SPEAKER_02And it eliminates the whose spreadsheet is right problem.
SPEAKER_00Completely. Third, build lineage from the start. If you're in financial services, BCBS239 demands it anyway. But even outside regulated industries, lineage is what lets you trace errors, understand impact, and trust your data. You can't build trustworthy AI on data you can't trace.
SPEAKER_02And fourth, I'll add this one: make the business case visible. We've talked about 30% return on modeling investment, 5% higher project ROI, 15 million in annual data quality costs that become recoverable. These numbers exist. Use them. Because one of the reasons data modeling doesn't get funded is that nobody puts it in the language of business outcomes.
SPEAKER_00That's such a good point. The data team often talks about modeling in technical terms, normalization, conformed dimensions, ontologies, and the business hears jargon. But when you say, this is why our AI answered 0% of KPI questions correctly, or this is why we lost 8% of revenue to bad data, that lands differently.
SPEAKER_02Completely differently. And I think there's a broader message here for anyone leading a data or technology function. You need to be the translator. You need to take these technical realities and express them as business risk, business opportunity, and business value. Because the investment case is strong. It's just not being made in the right language.
SPEAKER_00And the urgency is real too. Because every organization is either deploying AI now or about to. And if the foundations aren't there, those AI programs are going to hit the same wall that 80% of them are already hitting. The technology will work. The data won't support it.
SPEAKER_02So to bring it home, if there's one thing a leader takes from this episode, what would you want it to be?
SPEAKER_00That data modeling isn't the boring prerequisite you do before the exciting AI work. It is the work that makes AI work. The Sequada benchmark shows it. 0% accuracy without it, measurable accuracy with it. The enterprise failures show it. Unity, Watson, Walmart. The ROI shows it. 30% return on every dollar. The evidence is overwhelming and consistent.
SPEAKER_02And the way I'd frame it for a board conversation, you're not choosing between investing in AI and investing in data modeling. You're choosing between AI that works and AI that doesn't. Because better AI still starts with better foundations.
SPEAKER_00Always has. Always will.
SPEAKER_02If this one hit home, share it with someone who's making investment decisions on AI right now. They need to hear this before the budget's spent and the foundations are still missing. Thanks for listening to our second episode of AI Beyond the Home.
SPEAKER_00See you next time.