In the AI Era, Code Review Becomes Specification Review

When the PR Looks Fine but Still Feels Wrong

If you have been shipping with AI for more than a few weeks, you probably know this feeling.

You open a pull request. The code is clean enough. The naming is decent. The tests exist. Nothing looks obviously broken. And yet something about it feels off.

Maybe the boundary is a little blurry. Maybe the contract is implied instead of stated. Maybe the happy path is covered, but the real business rule still lives in somebody’s head. You can feel the risk, even if you cannot point to one catastrophic line.

That is the new review problem.

Most teams still run AI-assisted coding through an old workflow: write a prompt, generate a diff, open the PR, and ask a senior engineer to inspect the code carefully enough to catch broken invariants, missing edge cases, architecture drift, and hidden risk.

That model made sense when humans wrote almost all of the implementation themselves. It makes a lot less sense when a model can generate the implementation, the first test suite, the schema, and even the contract scaffolding in one pass.

At that point, line-by-line code review stops being the highest-leverage place to spend senior attention.

The real question is no longer, “Does this function look reasonable?”

The real question is, “Did we define the right behavior, boundary, and invariant before the function existed?”

That is the workflow inversion AI creates. Code review does not disappear. But the center of gravity moves upstream. In the AI era, code review becomes specification review.

Why This Shift Feels So Different

For a long time, most engineering teams treated specifications as supporting material.

The code was the real thing. The doc comment was a hint. The ADR was context if you had time to read it. The schema was “just validation.” The Gherkin file was nice to have if someone bothered to keep it current.

That hierarchy is breaking.

If an LLM can turn a specification into implementation artifacts quickly and repeatedly, then the specification is no longer secondary. It becomes the source object the rest of the system is derived from.

And that changes where mistakes are cheapest to catch.

If the specification is vague, the model can produce a beautifully structured implementation of the wrong thing. It can give you clean code, plausible tests, and a false sense of confidence. A strong code review can still miss the real problem because the bug is not in the syntax. The bug is in the intent.

That is what makes this moment so tricky and so important. AI makes it easier to produce convincing output. It also makes it much more dangerous to be sloppy about what you asked for.

If the specification is precise, bounded, and testable, everything gets easier. Implementation gets easier. Validation gets easier. Review gets easier. CI gets stronger.

That is why the best human review effort is shifting away from hand-auditing every branch and toward tightening the artifact that defines the branch behavior in the first place.

A Specification Is Not a Giant Requirements Document

When people hear the word “specification,” they often picture a bloated document nobody wants to write and nobody trusts six weeks later.

That is not what matters here.

In a practical AI-assisted workflow, a specification is any artifact that defines intended behavior tightly enough for generation and enforcement.

That might be:

a Markdown ADR describing a boundary rule
a Zod schema defining external input shape
a function signature with a sharp doc comment
a Gherkin scenario that captures observable behavior
a contract block with preconditions and postconditions
a reducer model or state transition table

None of these need to be heavyweight. They just need to be clear enough that tools can do something useful with them.

That is the threshold that matters. A useful specification is not just readable by humans. It is actionable by the system around the codebase.

What Great Review Starts to Look Like

In the old model, a senior engineer spends their energy on questions like:

is this implementation clean?
did the author miss an edge case?
are these tests strong enough?
does this import violate a boundary?

Those questions still matter. They are just no longer the highest-leverage first questions.

In a specification-first workflow, the more valuable questions are:

is the contract actually correct?
does the schema define the real boundary?
are the business rules complete?
is the ADR precise enough to enforce?
do these listed properties express the actual semantics?

Those are better review questions because one good answer improves multiple artifacts at once.

If you tighten a vague ADR, you improve the architecture rule, the implementation guidance, and the CI enforcement in one move.

If you fix a weak schema, you improve runtime validation, type inference, and code generation quality in one move.

If you sharpen a contract, you improve implementation, tests, and mutation resistance in one move.

That is the leverage difference. You are no longer inspecting outputs one by one. You are reviewing the thing that shapes the outputs.

Why Traditional Code Review Starts to Crack

Traditional code review assumes humans are the primary authors and that the reviewer is checking the quality of a human thought process from the finished code.

With AI, that assumption gets weaker every week.

A model can produce fifty plausible lines in seconds. It can produce another fifty just as quickly. And another after that. If your whole process depends on a reviewer manually catching semantic drift buried inside that stream, the review surface grows faster than human attention ever will.

That creates a bad equilibrium:

code generation speeds up
diffs get larger or more frequent
reviewer fatigue rises
semantic confidence falls
teams start compensating with vibes, intuition, and “looks good to me”

That is not a scaling strategy. That is accumulated risk with good formatting.

The better move is to reduce how much subjective review you need in the first place. Push more meaning into deterministic, reviewable specifications. Then let the machine check whether the code still aligns with that meaning.

CI Is What Makes This Real

This workflow inversion only works if the specification is tied to enforcement.

Otherwise you are just renaming documentation and hoping people respect it more.

The point is to make the specification operational.

That means:

architecture decisions compile into dependency rules
schemas define runtime-safe and type-safe boundaries
contracts generate executable checks
property lists drive test generation
critical semantics become merge gates

Once that happens, CI stops being a passive build system and becomes the mechanism that keeps implementation aligned with intent.

That is also why living specifications finally become realistic for normal teams. Historically, documentation rotted because nobody had the time to keep text and code in sync by hand.

AI changes the economics of writing and updating the artifacts. CI changes the economics of enforcing them.

You need both. AI without enforcement gives you polished drift. Enforcement without good specs gives you rigid confusion.

Hard Guards, Soft Reviews

This is also the part of the Autotomy philosophy that feels increasingly right to me.

The idea is simple: put the non-negotiable rules into hard guards, then let human review focus on the parts that actually require judgment.

That means types, schemas, contracts, dependency rules, and deterministic checks handle the known failure modes. They run every time. They do not get tired. They do not get distracted by a nicely formatted diff. They do not care whether the code came from a staff engineer or a language model.

Then the review layer gets smaller and better.

Instead of spending senior attention on things the system could have rejected automatically, you spend it on tradeoffs, semantics, interfaces, and architecture. You ask whether the boundary is correct, whether the contract is honest, whether the replacement really satisfies the interface, whether the system is getting easier or harder to change.

That last part matters a lot.

One of the healthiest side effects of specification-first work is that it pushes you toward cleaner cut points. If a module can be replaced as long as it satisfies the contract and passes the checks, you stop treating every implementation as sacred. You start designing for safe replacement instead of painful preservation.

That is a subtle shift, but it changes how teams handle growth. The codebase stops feeling like something you can only patch carefully from the inside. It starts feeling like a system with explicit seams.

This Is Actually Good News for Senior Engineers

This shift does not make senior engineering judgment less valuable. It makes it more focused and more important.

The senior engineer is no longer most valuable as a human syntax diff engine. They are most valuable where ambiguity gets resolved, invariants get chosen, interfaces get shaped, and tradeoffs get made explicit.

That means more time spent on:

writing precise ADRs
defining contracts and schemas
reviewing semantic changes instead of stylistic ones
deciding which properties deserve enforcement
turning architecture into mergeable rules

That is a much better use of expensive attention.

The machine is very good at filling in implementation detail. Senior engineers are still far better at deciding what must remain true when the system gets stressed, changed, and scaled.

You Do Not Need a Giant Process Rewrite

This can sound bigger than it really is.

You do not need to launch a formal methods initiative. You do not need to stop shipping. You do not need to bury the team in process.

Start with one narrow shift: treat one class of specification as a first-class reviewed artifact.

A practical sequence looks like this:

require precise doc comments and schemas on critical boundaries
treat ADR changes as senior-review items
generate tests and contracts from those artifacts
enforce architecture and contract drift in CI
downgrade style-only review comments in favor of semantic review

That is enough to change the culture.

Once teams feel that better specifications reduce downstream review churn, the model starts reinforcing itself. Reviews get sharper. Diffs get less scary. Trust improves.

The Real Strategic Shift

The teams that win with AI will not be the teams that merely generate code faster.

They will be the teams that move human attention to the narrowest, highest-leverage part of the workflow.

That part is the specification.

When the specification becomes the primary artifact, code stops being the only thing worth reviewing line by line. It becomes one output of a more disciplined system.

That is the real shift.

Implementation still matters. A lot. But increasingly, the most important review decision happens before the implementation exists.

And that is why, in the AI era, code review becomes specification review.