Is prompt injection a real security threat, or just a demo?

It's real and it's been demonstrated end-to-end against a shipping product. EchoLeak (CVE-2025-32711), disclosed by Aim Security in June 2025 and rated CVSS 9.3 (critical) by Microsoft, was a zero-click flaw in Microsoft 365 Copilot that exfiltrated corporate data from a single crafted email. An independent research paper documenting it calls it the first real-world zero-click prompt-injection exploit in a production LLM system — which is the point at which 'just a demo' stops being an accurate description.

Can prompt injection happen without the user doing anything?

Yes. EchoLeak was zero-click: the victim only had to have Copilot process an email an attacker sent them — no link to click, no attachment to open. The malicious instructions were hidden in the email, pulled into the model's context when Copilot retrieved it, and acted on automatically. Anything that quietly ingests untrusted content into an LLM's context inherits that exposure.

Don't prompt-injection filters or classifiers solve this?

They help, but they are not sufficient on their own. Google's security team documented 'Task Injection,' where malicious sub-tasks are planted in a page as normal-looking content and slip past prompt-injection classifiers precisely because they don't look like an attack. Classifiers are one layer; you also need least-privilege tools, egress controls, and human confirmation on consequential actions, because any single layer will eventually be bypassed.

How do I know if my LLM feature is exposed to this?

If your feature reads content an attacker could influence — emails, documents, web pages, support tickets, PDFs, retrieved search results — into the same context as its own instructions, and can then reach data or take actions or make network calls, it has the EchoLeak shape. The model cannot reliably tell your instructions from attacker text in the same window, so the exposure is structural, not a tuning problem.

← Notes

Field note

EchoLeak: prompt injection is now a production vulnerability, not a research demo

A zero-click Microsoft 365 Copilot flaw (CVE-2025-32711, CVSS 9.3) turned prompt injection from a demo into real data exfiltration. What it means the moment your LLM touches untrusted input.

Last updated July 1, 2026

Prompt injection stopped being a demo the moment it got a CVE and exfiltrated real data from a shipping Microsoft product without anyone clicking anything. That moment was EchoLeak — CVE-2025-32711, rated CVSS 9.3 (critical) by Microsoft — a zero-click flaw in Microsoft 365 Copilot disclosed in June 2025 by Aim Security (whose research now lives at Cato Networks). If your team still files prompt injection under “interesting research,” this is the note that changes the filing.

The demo everyone dismissed became CVE-2025-32711

EchoLeak worked like this: an attacker sends an ordinary-looking email with instructions hidden in it. The victim does nothing. When Microsoft 365 Copilot later retrieves that email into its context — the ordinary behavior of a retrieval-augmented assistant — the model reads the hidden instructions as if they were part of its task, and follows them. In the documented exploit that meant exfiltrating the user’s data, by abusing a Microsoft Teams proxy that was on the content-security-policy allowlist — a permitted egress path the attacker didn’t have to break, just use.

The framing of the independent research paper documenting the exploit is the part to sit with: it describes EchoLeak as the first real-world zero-click prompt-injection exploit in a production LLM system, and prompt injection as “a practical, high-severity vulnerability class in production AI systems.” Microsoft fixed it server-side and there’s no evidence it was used in the wild — but the proof of concept is complete, and the technique is public.

Your LLM feature has the same shape

The reason EchoLeak matters to you has nothing to do with Copilot specifically. It’s the shape of the system. An LLM reads its instructions and its input from the same context window, and it cannot reliably tell one from the other. The instant that window contains content an attacker can influence — a retrieved email, a scraped web page, a customer-submitted ticket, a PDF in a knowledge base — that content can carry instructions the model will treat as legitimate.

So the diagnostic question isn’t “did we write a secure prompt.” It’s: does this feature ingest attacker-influenceable content, and can it then reach data or take an action? If both are true, you have the EchoLeak shape, and a strong model makes it worse, not better, because a more capable model follows the injected instructions more competently.

It’s a vulnerability class, and the class is moving

Treating EchoLeak as a single patched bug misreads it. Two signals say this is a category that will keep producing incidents:

The attack surface evolves faster than filters. Google’s security team documented Task Injection: malicious sub-tasks planted in a page as normal-looking content — a fake “solve this CAPTCHA to continue” step, for example — that slip past prompt-injection classifiers because they don’t read like an attack. Google found working versions against OpenAI’s Operator; they were patched, but the point is that classifier-only defenses have a bypass by construction.
The industry is codifying it. OWASP’s Top 10 for Agentic Applications now lists Agent Goal Hijack (ASI01) — manipulating what an agent is trying to do — and Tool Misuse and Exploitation (ASI02) — getting an agent to abuse tools it legitimately holds — as top-tier risks. Both are injection wearing different clothes.

The guardrails that actually address it

This is the Guardrails dimension of the production-readiness bar made concrete. No single control is sufficient; the defense is layered, and it’s mostly about limiting what the model can do, not just what it can be told.

The exposure	The control	Why a filter alone isn’t enough
Untrusted content enters the context as if it were instructions	Separate data from instructions; mark retrieved/tool content as untrusted and never as commands	Task Injection shows classifiers can be dressed to look like ordinary text
The model can reach data or actions beyond the task	Least-privilege tools and scoped permissions per feature	An injected instruction can only do what the model is allowed to do
A permitted egress path becomes an exfiltration channel	Egress allowlisting; treat outbound calls/links as sensitive	EchoLeak exfiltrated through an allowed domain, not a blocked one
Consequential actions run unattended	Human-in-the-loop confirmation on actions that move data or money	The one gate an attacker can’t inject their way past

The pattern is defense in depth: input controls, output/egress controls, least privilege, and a human on the actions that carry real consequences — because you are designing for the assumption that some injected instruction will get through.

Why this is a launch-blocker, not a footnote

For a regulated buyer, EchoLeak is the concrete version of the fear that already stalls launches: the moment an LLM feature touches untrusted input and can reach real data, injection is a live path to disclosure — the same exposure the HIPAA review turns into a legal one when the data is PHI. This is exactly what a security review should block on, and increasingly will.

If you’re de-risking a RAG or agent feature before launch and want to know whether it has the EchoLeak shape — and what it would take to close it — that’s the production-readiness audit: a fixed-scope assessment of your guardrails, tool scoping, and egress paths against the bar, with a prioritized risk register and a path to close the gaps before the feature ships.

Questions this raises

Straight answers.

Is prompt injection a real security threat, or just a demo?: It's real and it's been demonstrated end-to-end against a shipping product. EchoLeak (CVE-2025-32711), disclosed by Aim Security in June 2025 and rated CVSS 9.3 (critical) by Microsoft, was a zero-click flaw in Microsoft 365 Copilot that exfiltrated corporate data from a single crafted email. An independent research paper documenting it calls it the first real-world zero-click prompt-injection exploit in a production LLM system — which is the point at which 'just a demo' stops being an accurate description.
Can prompt injection happen without the user doing anything?: Yes. EchoLeak was zero-click: the victim only had to have Copilot process an email an attacker sent them — no link to click, no attachment to open. The malicious instructions were hidden in the email, pulled into the model's context when Copilot retrieved it, and acted on automatically. Anything that quietly ingests untrusted content into an LLM's context inherits that exposure.
Don't prompt-injection filters or classifiers solve this?: They help, but they are not sufficient on their own. Google's security team documented 'Task Injection,' where malicious sub-tasks are planted in a page as normal-looking content and slip past prompt-injection classifiers precisely because they don't look like an attack. Classifiers are one layer; you also need least-privilege tools, egress controls, and human confirmation on consequential actions, because any single layer will eventually be bypassed.
How do I know if my LLM feature is exposed to this?: If your feature reads content an attacker could influence — emails, documents, web pages, support tickets, PDFs, retrieved search results — into the same context as its own instructions, and can then reach data or take actions or make network calls, it has the EchoLeak shape. The model cannot reliably tell your instructions from attacker text in the same window, so the exposure is structural, not a tuning problem.

Production-Readiness Audit

This is the work, not just the writeup.

If this is your situation, the production-readiness audit is where it gets fixed — by the person who wrote this.

Request an audit