Is AI-generated code less secure than human-written code?

The largest controlled study we know of says yes, materially. Veracode's 2025 GenAI Code Security Report tested code from over 100 large language models across 80 tasks and found 45% of samples failed security tests by introducing OWASP Top 10 vulnerabilities — 72% for Java specifically. That's a property of how the models generate code, measured under controlled conditions, not an anecdote.

Won't a newer, more capable model fix the security problem?

The data says no. Veracode's March 2026 update found the security pass rate flat at roughly 55%, unchanged even as coding-capability benchmarks like HumanEval kept improving. Models got better at writing code that works; they did not get better at writing code that's safe. Betting your security posture on the next model release is betting against the trend line.

My team ships much faster with AI tools — am I accumulating a security or tech-debt bomb?

Very likely, and it's measurable. Apiiro's telemetry across enterprise repos found AI-assisted developers commit 3–4× faster while monthly security findings rose from ~1,000 to over 10,000 in six months, with privilege-escalation paths up 322% and design flaws up 153%. GitClear's analysis of 211M lines found code duplication rising and refactoring falling over the same AI-adoption period. These are observational — correlated with AI use, not proven caused by it — but the direction is consistent across independent datasets.

How do we keep the AI speed without shipping the vulnerabilities?

Move verification from humans-in-review to machines-in-the-pipeline: policy-as-code, SAST, dependency and secret scanning, tests, and least-privilege checks that run on every AI-authored change and block merge on violation. A human approves at that gate rather than hand-inspecting every line — so the checking scales with the volume of code the way manual review can't.

← Notes

Field note

The AI-generated-code security tax: shipping 4× faster, failing security tests nearly half the time

AI coding tools ship code ~4× faster — and independent data shows ~45% of it fails security tests, with findings scaling as fast as velocity. The fix is a gate, not a brake.

Last updated July 1, 2026

Your team is shipping faster with AI coding tools. That part is real, and it isn’t going away. The problem is that the independent data on what that code contains is unambiguous: in the largest controlled test we know of, 45% of AI-generated code failed security tests. You are shipping roughly four times faster, and close to half of it introduces a known vulnerability class. That gap has a name — it’s a tax on velocity — and the way you stop paying it is a gate, not a brake.

The tax, in three independent datasets

I’m going to be precise about what kind of evidence each of these is, because it matters.

Veracode (controlled study). The 2025 GenAI Code Security Report tested code from over 100 LLMs across 80 coding tasks in Java, Python, C#, and JavaScript. 45% of samples failed security tests by introducing OWASP Top 10 vulnerabilities — Java worst at 72%; 86% failed to defend against cross-site scripting and 88% against log injection. Crucially, the March 2026 update found the pass rate flat at ~55% while coding-capability benchmarks kept climbing. Better models, same insecure output.
Apiiro (observational telemetry). Across enterprise codebases, AI-assisted developers committed 3–4× faster while monthly security findings rose from ~1,000 to over 10,000 in six months — privilege-escalation paths up 322%, architectural design flaws up 153%.
GitClear (observational telemetry). Across 211 million changed lines, copy-pasted (cloned) code rose from 8.3% to 12.3% (2020→2024), refactoring fell from 25% to under 10%, and duplicate code blocks increased roughly eightfold — trends the report attributes to rising AI-assistant use.

The Veracode number is experimental and causal within its test harness. The Apiiro and GitClear numbers are observational — correlated with AI adoption, not proven to be caused by it. But three independent methodologies pointing the same direction is a signal worth acting on, not explaining away.

Why speed and risk scale together

This isn’t a story about AI being bad at code. It’s arithmetic. AI removed the friction that used to bound how much code a team produced — but it did nothing to the review capacity that used to vet it. So you get 3–4× the code volume against the same number of senior reviewers, and (per Veracode) the rate at which that code fails security tests hasn’t improved. More code, multiplied by an unchanged security failure rate, divided by a fixed human-review budget, equals the findings surge Apiiro measured. The velocity you gained is exactly the pressure that breaks manual review.

The fixes that don’t work

Three responses are common and all fail:

Slow down or ban the tools. You throw away a real, compounding productivity gain, and your engineers will route around the ban anyway.
“Our senior engineers will catch it in review.” Human review is the bottleneck the tools just blew past. It does not scale to 4× throughput, and asking it to is how the findings pile up unread.
Wait for a better model. Veracode’s flat security curve is the direct refutation: capability gains have not been security gains.

The fix that does: a machine-verified gate between generation and merge

If the checking can’t stay manual, it has to become mechanical. The move is to shift verification from humans-in-review to machines-in-the-pipeline — a gate every AI-authored change must pass before merge:

What the data shows AI code does	The gate control that catches it
Introduces OWASP-class vulnerabilities ~45% of the time (Veracode)	SAST + policy-as-code that blocks merge on OWASP-class findings
Broadens permissions and adds design flaws (Apiiro +322% / +153%)	Least-privilege and infrastructure-as-code policy checks; architecture gates
Clones instead of reusing; skips refactoring (GitClear)	Duplication and complexity thresholds; require accompanying tests
Merges with no accountable owner	A human approval step at the gate, with a named owner

The principle is the one we build everything on: agents propose, verification and policy decide, and a human approves the merge — but the human approves at the gate, not by hand-inspecting every line. The gate scales with the volume of code; a reviewer’s afternoon does not. We run our own production infrastructure this way, which is the point of the build-as-proof writeup: the safety gate isn’t theoretical, it’s how the code behind this site ships.

The bottom line

The velocity is real and worth keeping. The 45% is also real, and it scales with the velocity unless something mechanical stops it. The teams that win with AI-assisted development won’t be the ones that used it most or least — they’ll be the ones who put a machine-verified gate between generation and merge before the findings compounded.

If your team’s AI-assisted output has outrun your ability to review it safely, that’s the governed agentic-delivery work: designing and building the verification gate — policy-as-code, security scanning, tests, and least-privilege checks — that lets you keep the speed without merging the vulnerabilities.

Questions this raises

Straight answers.

Is AI-generated code less secure than human-written code?: The largest controlled study we know of says yes, materially. Veracode's 2025 GenAI Code Security Report tested code from over 100 large language models across 80 tasks and found 45% of samples failed security tests by introducing OWASP Top 10 vulnerabilities — 72% for Java specifically. That's a property of how the models generate code, measured under controlled conditions, not an anecdote.
Won't a newer, more capable model fix the security problem?: The data says no. Veracode's March 2026 update found the security pass rate flat at roughly 55%, unchanged even as coding-capability benchmarks like HumanEval kept improving. Models got better at writing code that works; they did not get better at writing code that's safe. Betting your security posture on the next model release is betting against the trend line.
My team ships much faster with AI tools — am I accumulating a security or tech-debt bomb?: Very likely, and it's measurable. Apiiro's telemetry across enterprise repos found AI-assisted developers commit 3–4× faster while monthly security findings rose from ~1,000 to over 10,000 in six months, with privilege-escalation paths up 322% and design flaws up 153%. GitClear's analysis of 211M lines found code duplication rising and refactoring falling over the same AI-adoption period. These are observational — correlated with AI use, not proven caused by it — but the direction is consistent across independent datasets.
How do we keep the AI speed without shipping the vulnerabilities?: Move verification from humans-in-review to machines-in-the-pipeline: policy-as-code, SAST, dependency and secret scanning, tests, and least-privilege checks that run on every AI-authored change and block merge on violation. A human approves at that gate rather than hand-inspecting every line — so the checking scales with the volume of code the way manual review can't.

Agentic Delivery

This is the work, not just the writeup.

If this is your situation, the agentic delivery is where it gets fixed — by the person who wrote this.

Request an audit