The Same PR, Two Different Reviews
A developer in my network set up Claude Code as his CI code reviewer. “Just have Claude check the PR,” he told me. “It catches things I’d miss.” I asked him to run the same PR through Claude twice.
The first review said the error handling looked comprehensive. The second review flagged it as inadequate and suggested three changes. He stared at the screen for five minutes wondering which Claude was right.
Neither was right. Neither was wrong. The LLM was making probabilistic guesses, not applying deterministic rules. And that is the fundamental problem with using AI for verification: verification requires two properties that LLMs don’t have. Determinism — the same input always produces the same output. Completeness — all failure modes get caught, every time.
The Determinism Problem
Run the same code through Claude Code twice and you might get two different reviews. Temperature, context window, prompt phrasing — all introduce variance. For brainstorming, variance is a feature. For verification, it’s a bug. You cannot build reliable software on probabilistic quality gates.
Deterministic checks don’t have this problem. tsc --noEmit always produces the same errors. ESLint with the same config always flags the same issues. These tools apply rules, not probabilities. They’re boring. That’s why they work.
The O(1) Guard Stack
Here’s what our pre-commit stack looks like. Every check is deterministic. Every check runs in milliseconds. None of them care whether a human or Claude Code wrote the code:
# Pre-commit (< 1 second total)
TypeScript strict mode # tsc --noEmit
ESLint boundary rules # module isolation
Import restrictions # dependency direction
dependency-cruiser # graph analysis
Unit tests # contract verification
# CI (< 30 seconds)
Full test suite
Bundle size checks
Fresh clone launch test
The key metric is O(1) relative to codebase size. Type checking is incremental. Linting is per-file. dependency-cruiser analyzes the import graph linearly. None get slower as your vibe-coded app grows. Claude Code reviews get slower and less accurate as your codebase outgrows the context window.
What Deterministic Checks Actually Catch
Last week, dependency-cruiser caught a PR where a screen component imported directly from a service implementation instead of the barrel. The PR compiled fine. Tests passed. Cursor had generated the import automatically. But it violated our architecture rule: only the composition root imports implementations.
An LLM review might have caught this. Or it might have missed it because the PR was large and the import was buried mid-file. dependency-cruiser never misses it because it’s not reading code — it’s analyzing a graph:
// .dependency-cruiser.cjs
// 8 lines. 100ms. Catches this every time, forever.
{
name: 'no-deep-service-imports',
comment: 'Only service barrels are public.',
severity: 'error',
from: { pathNot: '^src/services/' },
to: {
path: '^src/services/[^/]+/',
pathNot: '^src/services/[^/]+/index\.ts$',
}
}
Eight lines. Under one hundred milliseconds. Every PR. Forever. A Claude Code review costs tokens, takes seconds, and might miss it. The math is not close.
Where AI Actually Belongs
I’m not anti-AI. I use Cursor and Claude Code daily — for drafting code, exploring approaches, writing tests, debugging. But I don’t put them in verification roles. Verification must be deterministic. It must be boring. It must be the same every single time.
Guardrails should be invisible and inevitable. They should catch problems without anyone having to remember to run them. That’s what deterministic checks give you: a safety net that never sleeps, never gets tired, and never loses context as your codebase grows.
The Autotomy Expo Starter Pack ships with TypeScript strict mode, ESLint boundary rules, dependency-cruiser configuration, and pre-commit hooks — all pre-configured. You get Cursor speed for generation. You get machine precision for verification. That’s how you vibe code without breaking production.