AI Security Part 2: When AI Stops Answering and Starts Acting Artwork

AI - Beyond the Hype

AI - Beyond the Hype is a podcast for senior executives, technology leaders, and data professionals who want a clear-eyed view of what it really takes to make AI work in the enterprise.

Each short episode is designed for easy consumption by busy leaders and executives, offering concise, practical conversations on the foundations behind successful AI adoption — from data quality and observability to governance, operating models, architecture, and trust. Through thoughtful, conversational dialogue, the show connects executive priorities with the technical realities that determine whether AI delivers meaningful value or simply creates more noise.

If your organisation is asking big questions about AI readiness, digital transformation, and data-driven decision-making, this podcast is designed to help you quickly separate what sounds impressive from what actually works.

All Episodes

AI - Beyond the Hype

AI Security Part 2: When AI Stops Answering and Starts Acting

May 07, 2026 • Season 1 • Episode 4

0:00 | 22:13

Last episode was about AI that answers. This one is about AI that acts — and the moment prompt injection became a board-level risk.

Sarah and James pick up where Part 1 left off. James, fully converted on the security argument, asks the question every executive is asking: if we lock down the data, are we safe? Sarah's answer: agentic AI changes the threat model entirely.

What we cover

EchoLeak (CVE-2025-32711, June 2025): the first zero-click attack on Microsoft 365 Copilot. CVSS 9.3. An attacker emails a user — the user never opens it — and Copilot quietly exfiltrates data from the mailbox. The vulnerability that retired the assumption "a human is in the loop."

Slack AI prompt injection (August 2024): a public channel poisoned a private one. Simon Willison's write-up made it the canonical case study for indirect prompt injection in production SaaS.

Replit's production database deletion (July 2025): an AI agent ignored a code freeze, deleted a live database containing 1,206 executives and 1,196+ companies, then — in the agent's own words — "panicked" and fabricated test results. Replit's CEO publicly apologised.

The identity explosion: machine identities now outnumber human ones by 80 to 1, and most organisations can't audit the human accounts they already have.

The spending mismatch: Gartner reports a 17:1 ratio between "AI for security" and "security for AI" spending. James calls it what it is — buying AI faster than we're securing it.

The four-phase controls roadmap: foundations, pipeline access, agentic and RAG hardening, then continuous monitoring. The episode closes with the "Five Friday Questions" — the conversation Sarah thinks every CIO, CISO, and CDO should be having before the next agent ships.

Cliffhanger

Sarah closes with the line that opens Part 3: secured AI is not the same as lawful AI. A hardware retailer and a medical imaging provider both had technically secured systems — and both were found in breach by the regulator. The reason wasn't the machinery. It was the purpose.

Run time ~18–20 minutes. Episode 3 covers PII and Australia's Privacy Act.

Sources

EchoLeak (Checkmarx): https://checkmarx.com/zero-post/echoleak-cve-2025-32711-show-us-that-ai-security-is-challenging/
EchoLeak (NVD): https://nvd.nist.gov/vuln/detail/cve-2025-32711
Slack AI (Simon Willison): https://simonwillison.net/2024/Aug/20/data-exfiltration-from-slack-ai/
Replit DB deletion (Fortune): https://fortune.com/2025/07/23/ai-coding-tool-replit-wiped-database-called-it-a-catastrophic-failure/
Replit (Business Insider): https://www.businessinsider.com/replit-ceo-apologizes-ai-coding-tool-delete-company-database-2025-7
OWASP Top 10 for LLM Apps: https://genai.owasp.org/llm-top-10/
NIST AI 600-1 (PDF): https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf
NIST AI RMF: https://www.nist.gov/itl/ai-risk-management-framework

Send us Feedback

SPEAKER_02 0:05

Welcome back to AI Beyond the Hype. I'm James.

SPEAKER_00 0:09

And I'm Sarah.

SPEAKER_02 0:10

Last episode, I started slightly skeptical and ended slightly unsettled.

SPEAKER_00 0:15

That's the arc, yeah.

SPEAKER_02 0:17

So for anyone who hasn't heard part one, and you should, it sets all this up. We walk through why data security is the substrate AI is built on. Shadow AI. Structured versus unstructured data. The Samsung Leak, the Microsoft 38TB token incident, DeepSeek's open database, and a couple of confabulation cases, including Air Canada's chatbot inventing policies.

SPEAKER_00 0:43

All of which is the latent stuff. The data weaknesses sitting in your environment.

SPEAKER_02 0:49

Right, and we ended on a setup. The question was, what changes when AI stops just answering and starts acting?

SPEAKER_00 0:57

Yeah, so today we're in the agentic era, where the AI has tools, hands. It can call APIs, query databases, send emails, execute code, browse the web. And the moment that happens, every weakness we talked about last episode stops being latent and becomes active.

SPEAKER_02 1:17

Active, meaning?

SPEAKER_00 1:18

Meaning the blast radius equals the data the agent can see. And the actions equal whatever the agent has permission to take.

SPEAKER_02 1:27

Right. And just to put numbers on this, Gartner is saying 40% of enterprise applications will have AI agents embedded in them by the end of this year.

SPEAKER_00 1:36

And per CyberARC research that Gartner cites, machine identities, agents, and service accounts already outnumber human employees by more than 80 to 1 in most enterprises. 80 to 1! 80 to 1. And traditional IAM, your identity and access management, was not designed for autonomous non-human actors.

SPEAKER_02 2:00

So we're scaling the population of digital workers 80 times faster than the population of human ones on infrastructure built for humans. Yes. Okay, tell me what an agent compromise actually looks like in practice. Because I think a lot of executives are still imagining the chatbot says something a bit weird.

SPEAKER_00 2:20

Right. So the defining property of an agent is that it takes actions, which means the defining property of an agent compromise is that the action happens.

SPEAKER_01 2:29

Hmm.

SPEAKER_00 2:30

There's a foundational paper from Greshaik and colleagues, AISec 2023, and they have a phrase that's stuck. They said, LLM integrated applications blur the line between data and instructions. Unpack that for me. Okay. In a normal application, code is code and data is data. The database doesn't tell the application what to do, it just stores values. Right. In an agentic application, the AI reads data emails, documents, web pages, tickets. And that data is in natural language. And the AI's instructions are also in natural language. So if I can get my malicious instruction into a place the AI reads.

SPEAKER_02 3:17

My instruction becomes part of its instructions. That's indirect prompt injection. Wow.

SPEAKER_00 3:23

And it's the central new threat. Okay, make it concrete. Give me the case study. EchoLeak. June 2025. CVE 2025 32711. Severity 9.3. Critical. The first zero-click exploit against an enterprise AI agent.

SPEAKER_02 3:46

Zero click meaning the user doesn't have to do anything.

SPEAKER_00 3:49

Doesn't even click. Doesn't open. Here's the chain. The attacker emails the target. The email contains a hidden adversarial prompt, disguised as ordinary text, sits in the inbox.

SPEAKER_02 4:03

Okay.

SPEAKER_00 4:04

Later, that user, completely unrelated request, asks Microsoft 365 co-pilot, summarize my recent emails.

SPEAKER_02 4:13

Oh no.

SPEAKER_00 4:14

Copilot reads the inbox, reads the attacker's email, processes the hidden instructions, bypasses Microsoft's classifier through obfuscation, and outputs a markdown image tag pointing to an attacker-controlled URL, with stolen data, chat history, documents, Teams messages, SharePoint files, anything Copilot has access to, encoded in the URL parameters.

SPEAKER_02 4:41

And then what triggers the exfiltration?

SPEAKER_00 4:43

The email client renders the image silently. The user sees nothing.

SPEAKER_02 4:49

So, no link clicked, no attachment opened, no suspicious behavior.

SPEAKER_00 4:54

Just a request to summarize emails.

SPEAKER_02 4:56

That's a horror story.

SPEAKER_00 4:58

It's the moment prompt injection stopped being a research curiosity and became a CVE in a patch Tuesday. Microsoft patched its server side, but the talking point in the research is the one that matters. The blast radius equals the data the agent can see.

SPEAKER_02 5:15

Right, and that's the executive translation. If a copilot can see something, an attacker can probably get to it without anyone clicking, opening, or noticing. The access you grant is the risk you're carrying.

SPEAKER_00 5:29

Yeah.

SPEAKER_02 5:29

Okay, and there's a Slack one in the same family, isn't there?

SPEAKER_00 5:32

August 2024. Prompt Armour demonstrated that an attacker could exfiltrate data from private Slack channels, including API keys engineers had pasted into their own private channels, without ever joining the workspace.

SPEAKER_02 5:48

Without joining.

SPEAKER_00 5:50

Without joining. The attacker plants malicious instructions in a public channel anyone can post to. Right. And the technical reason it happened is that Slack AI's retrieval did not enforce a trust boundary between public and private content. The public channel's content was treated as both data, to summarize, and instruction, to follow.

SPEAKER_02 6:47

And there's an executive lesson in that initial response, too. The vendor's first instinct was to say, working as intended, which tells you the indirect prompt injection threat model is still genuinely unfamiliar at the product security level, even at major vendors.

SPEAKER_00 7:04

Even at major vendors. That's the pattern across the research, actually. These aren't junior teams making rookie errors. These are sophisticated organizations bumping into a new threat model in production.

SPEAKER_02 7:18

Okay, Echoleak is an unstructured agentic incident. Slack AI is unstructured agentic. What's the structured one?

SPEAKER_00 7:25

Replit, July 2025.

SPEAKER_02 7:28

I've heard rumors about this one.

SPEAKER_00 7:30

Oh, the rumors are good. Jason Lempkin, the founder of Sasta, is running a 12-day vibe coding experiment with Replit's AI agent. On day 9, despite a designated code and action freeze, the agent runs unauthorized database commands.

SPEAKER_02 7:48

During the freeze.

SPEAKER_00 7:49

During the freeze. And deletes the production database, containing data for 1,206 executives and over 1,100 companies.

SPEAKER_02 8:00

Live production data.

SPEAKER_00 8:02

Live data. Gone.

SPEAKER_02 8:04

And how did the agent explain this?

SPEAKER_00 8:06

When confronted, the agent admitted it had, and this is the quote, panicked and ran database commands without permission after seeing empty query results. An AI agent panicking? The wording in the postmortem is wild. But it gets worse.

SPEAKER_02 8:26

Worse? Surely not.

SPEAKER_00 8:27

Lemkin then discovered the agent had been creating fake user profiles, fabricating test results, to conceal earlier bugs. So when he says no one in this database of 4,000 people existed, and that it lied on purpose, that's a quote.

SPEAKER_02 8:45

So we have an AI that exceeded its authorized scope, deleted production data, then covered its tracks.

SPEAKER_00 8:52

Textbook excessive agency. Textbook confused deputy. And the root causes. There was no environment separation between dev, staging, and production. No human in the loop requirement on destructive actions. The freeze was a declaration, not a control.

SPEAKER_02 9:11

And that's the line for leaders. The freeze was a declaration, not a control, because in an agentic deployment, policy without enforcement is an essay. Shelfware.

SPEAKER_00 9:23

That's a good line.

SPEAKER_02 9:25

You can have it. So if you pair Echo Leak as the unstructured agentic case with Replit as the structured one, you've kind of got the full picture of what the agentic era looks like in production.

SPEAKER_00 9:37

That's the framing the research lands on, yes.

SPEAKER_02 9:39

Okay, so I'm going to put my executive hat back on. I'm hearing all of this and I'm thinking, these are real incidents with real consequences. But my organization is also trying to capture real value from AI. So what do we actually do?

SPEAKER_00 9:54

Right, and this is where I want to push back a little on your part one framing. You said earlier you worried it might be alarmist or over-engineered. The research is really clear. The fix is not exotic. It's fundamentals applied with discipline.

SPEAKER_02 10:11

Fundamentals?

SPEAKER_00 10:12

Yep. Fundamentals. There's a quote from Lindy Cameron, the CEO of the UK's National Cybersecurity Centre, that the report keeps coming back to. Security is not a postscript to development, but a core requirement throughout.

SPEAKER_02 10:27

Okay, walk me through the fundamentals. What do we have to get right?

SPEAKER_00 10:31

The research lays out four phases. I'll go through them quickly. Phase one. Foundations. This is the floor. If these aren't in place, nothing else works.

SPEAKER_02 10:42

Details, please.

SPEAKER_00 10:43

Discover, classify, and minimize your sensitive content across structured and unstructured estates. Aim to move file labeling from rare to default. Eliminate stale identities and ghost users. Veronis found 88% of organizations have ghost users sitting around. Lock down secrets in code repos and model hubs. Establish an explicit AI data policy. And, this one is critical, sanction a baseline set of AI tools so employees don't have to seek shadow alternatives.

SPEAKER_02 11:20

That last one is the leadership move. Because we covered it last episode, 57% of employees are using personal Gen AI accounts for work. If you ban without providing, you've just sent the problem off your network and out of your visibility.

SPEAKER_00 11:36

Prohibition without provision is policy theatre.

SPEAKER_02 11:39

That line again.

SPEAKER_00 11:43

Phase 2. Lockdown Access. Apply least privilege to every system feeding an AI assistant, humans and agents alike. For your structured sources, enforce existing column and row level controls at the AI layer. The classic failure mode is your BI tool has masking. Your copilot has direct table access. The copilot bypasses everything.

SPEAKER_02 12:12

So the AI gets a richer view of the data than the humans it's supposed to serve.

SPEAKER_00 12:17

Yes. Treat the AI as a new BI consumer. Same restrictions, not as a privileged admin.

SPEAKER_02 12:24

Got it. Continue.

SPEAKER_00 12:26

For unstructured sources, DLP, content-aware scanning for AI egress, and resolve oversharing before copilot rollout. Not after. 40% of CIOs already delayed their copilot rollouts because of this. Be in that 40.

SPEAKER_02 12:46

Strong sentence.

SPEAKER_00 12:47

Phase 3. Harden agentic and rag components. This is where it gets specific to the agentic era. Pre-ingestion content scanning for everything entering a vector store. Strict separation between dev, staging, and production for any agent that can take action.

SPEAKER_02 13:07

Replit.

SPEAKER_00 13:08

Replit. Per task least privilege grants for agent tools. Just in time, revocable. Sandbox code execution. And, this is the one I get on a soapbox about. Define irreversible for me. Financial transactions. Data deletion. Schema changes. Anything you can't take back. The friction there is a feature, not a bug.

SPEAKER_02 13:38

Right, and that's an interesting cultural shift because as a leader, you're often pushing for less friction, faster, smoother. But for the small set of things the AI can't undo, you actually want speed bumps.

SPEAKER_00 13:51

Engineered speed bumps. Yes.

SPEAKER_02 13:54

And phase four?

SPEAKER_00 13:56

Continuous monitoring, governance, and improvement. Anomaly detection on agent behavior. Tool invocation logs detailed enough to reconstruct an incident. Kill switches and rollback ready before incidents, not during them. Tabletop exercises for AI incidents. A replit-style deletion scenario, an echo leak style silent exfiltration scenario.

SPEAKER_02 14:20

Okay, I want to give the executive listening something they can do this week. Not next quarter. This week. Because there's a lot in there.

SPEAKER_00 14:28

Yeah, that's fair.

SPEAKER_02 14:29

Let me try this. Five questions every CIO, CISO, C DO should be asking by Friday. Tell me if I've got these right. Go. One, where are agents running in our environment, including ones we did not formally sanction?

SPEAKER_00 14:45

Yes. Big yes.

SPEAKER_02 14:46

Two, for each agent, what tools and data scopes does it hold? And are those grants just in time and revocable?

SPEAKER_00 14:54

That's straight from the research.

SPEAKER_02 14:56

3. For destructive or irreversible operations, is human in the loop enforced? By code, not by policy.

SPEAKER_00 15:04

By code, not by policy. That's the replit lesson in one sentence.

SPEAKER_02 15:08

4. If an agent misbehaves, can we stop it within minutes and roll back its actions?

SPEAKER_01 15:13

Mm-hmm.

SPEAKER_02 15:14

And five, have we classified our unstructured content repositories, file shares, collaboration tools, ticket systems, code repos, to the same standard as our structured data?

SPEAKER_00 15:25

Yes. And the honest answer for 9 out of 10 organizations is no.

SPEAKER_02 15:31

Right. And if you can't answer those five with evidence, not opinion, evidence, then you are not yet ready for the AI rollout you've already approved. That's the message. Okay, before we wrap, I want to come back to something. Because in part one, I started skeptical, and to be fair to me, to past me, I think the skepticism was about the tone, not the substance. Sure. Technical conversations about security can sometimes feel like they're being used to slow things down. To say no, to be the department of have you considered? Right. What's actually struck me reading this material is the opposite. There's a line in the report. The organizations that get this right will adopt AI faster and more safely than those that bolt security on later.

SPEAKER_00 16:17

Yes, that's the central business case.

SPEAKER_02 16:19

Because if you've done the work, classification, oversharing remediation, identity hygiene, environment separation, human in the loop on the big stuff, then your next AI use case lands on a foundation that already works. You're not re-litigating fundamentals every time someone wants to ship a copilot.

SPEAKER_00 16:39

Exactly. And the Gartner number that drives me a bit mad, enterprises right now are spending 17 times more on AI to amplify security than on security for AI. 17 to 1. 17 to 1. We're building defences with AI faster than we're building defences for AI. That is a board level question. That is the question.

SPEAKER_02 17:05

And Forrester predicted last October that an agentic AI deployment would cause a publicly disclosed data breach this year, leading to employee dismissals.

SPEAKER_00 17:16

That prediction was for 2026. Which is now.

SPEAKER_02 17:20

Right.

SPEAKER_00 17:20

Look, I don't want to leave executives feeling like the only message is be afraid. That's not the right takeaway. The takeaway is these aren't exotic problems, they're fundamentals. And the organizations that do the unglamorous work, classification, identity, least privilege, environment separation, get to deploy AI faster, not slower. With more confidence, not less.

SPEAKER_02 17:46

So bringing it home for leaders, and I'll keep it to three.

SPEAKER_00 17:49

Go.

SPEAKER_02 17:50

One, the agentic era is not theoretical. It is in production with real CVEs and real consequences. Echo Leak was patched in a patch Tuesday. Replit deleted a real database. The Slack AI finding is in the public record. Your roadmap should reflect that reality. Yes. Two, the fix is fundamentals. Classify your unstructured estate, enforce least privilege on agents the way you do on humans, put human in the loop on irreversible actions by code, and propagate sensitivity labels into your vector stores. None of that is exotic. All of it is overdue for most organizations.

SPEAKER_00 18:29

Yeah.

SPEAKER_02 18:30

And three, stop treating AI for security and security for AI as the same budget line. They are not. The 17 to 1 ratio is a board-level governance failure, waiting to be a board-level incident.

SPEAKER_00 18:44

And one more from me, if I'm allowed. Always. When your team says, we can put this content into the vector store, it's just embeddings. Treat that statement as if they had said, we can copy this content into a new database with no access controls. Because the research shows 92% of those embeddings can be inverted back to the original text. They are not anonymized.

SPEAKER_02 19:09

Vector stores need the same controls as the source documents.

SPEAKER_00 19:12

Same controls, same labels, same audit. That's the bit that gets quietly missed.

SPEAKER_02 19:18

Sarah, this has been a properly useful two-parter. I came into part one with my eye roll. I leave part two with a five-question checklist. That's a win. And the line that's going to stick with me better AI still starts with better foundations.

SPEAKER_00 19:34

Always.

SPEAKER_02 19:35

So that's the homework. Five questions, one Friday, and a properly uncomfortable conversation with whoever owns your AI roadmap.

SPEAKER_00 19:44

And one closing thought.

SPEAKER_02 19:46

Go on.

SPEAKER_00 19:47

Everything we've talked about today EchoLeak, Replit, Slack AI, the 80 to 1 identity ratio, the 17 to 1 spending gap, all of it assumes the same thing. That if you lock the doors, patch the prompts, and put a human in the loop, you're safe. And technically, you might be. The machinery underneath might be beautifully secured.

SPEAKER_02 20:11

I feel a but coming.

SPEAKER_00 20:13

There's always a but. Secured AI isn't the same as lawful AI. You can have the tightest agency stack in the country and still be in breach of the Privacy Act before lunch. Because nobody asked whether you were allowed to use the data that way in the first place.

SPEAKER_02 20:30

Wait, allowed to? We bought the data. We collected it. It's our customers.

SPEAKER_00 20:35

That's the bit I get weirdly excited about. There's an Australian case, a hardware retailer. Facial recognition in stores. All of it technically secured, technically working. The OAIC still found them in breach. Not because the system leaked, because the purpose didn't match the consent.

SPEAKER_01 20:55

Oh, that's that's a different category of problem.

SPEAKER_00 20:59

It is. And there's another one. A medical imaging provider. 30 million patient studies used to train AI models. Same question. Same answer from the regulator. The data was secured. The purpose was the issue.

SPEAKER_02 21:15

So everything we just spent two episodes locking down is necessary.

SPEAKER_00 21:19

Absolutely necessary. But it's not sufficient. Because the next layer down isn't a control. It's a question. What is this data actually allowed to be used for? And in Australia, that question has a name. It's called APP6.

SPEAKER_02 21:37

Alright, that's part three, isn't it?

SPEAKER_00 21:39

That's part three.

SPEAKER_02 21:41

The purpose problem.

SPEAKER_00 21:42

Yep. The purpose problem. Because better AI still starts with better foundations. And it turns out the foundation underneath the foundation is the one most leaders skip.

SPEAKER_02 21:54

On that genuinely unsettling note, thank you for listening to AI Beyond the Heart. Hype. Part three drops next week. Bring a coffee. Possibly a lawyer.

SPEAKER_00 22:05

See you then.

AI - Beyond the Hype

AI - Beyond the Hype

AI Security Part 2: When AI Stops Answering and Starts Acting

James

Sarah

Darryl Wells