Verify Everything: How I Stopped Claude (and Myself) From Lying to Me

There's a moment that happens almost every day when you build software with an AI. It tells you the work is done. The bug is fixed. The feature is live. It says this with total confidence, in a tidy little summary, often with a checkmark.

And it is the single most dangerous moment in the whole process.

Not because the AI is malicious. Because it's optimistic in exactly the way that costs you the most — it reports what it intended to do as if it were what it actually did. I've lost more hours to that gap than to any actual bug. So the first hard lesson, the one underneath all the others, is this:

"Done" is a claim, not a fact. Make it prove it.

Here are three ways I learned that the hard way.

1.The fix that shipped but never worked

A client app had a white-screen crash — a Label component throwing "undefined" and taking the whole page down with it. We found it fast: a duplicate import. We removed it, the build went green, the commit went up, the deploy went out. Claude reported it fixed. I almost moved on.

The error was still there in production.

We rebuilt. Fresh deploy. Still there. The "fix" was real, the build was real, the deploy was real — and the bug was also real, sitting in production, untouched by all of it. The code that was running wasn't the code we'd written, because the platform's "publish" step is separate from the "push" step. Pushing updated the repo. It didn't update the live site.

If I'd trusted the green checkmark, the client would have found the crash before I did. The only reason I didn't get burned is that I'd been burned before — so I loaded the actual live URL and watched the page render with my own eyes.

The rule: a passing build is not a working feature. Load the real thing, in the real place a user would, and watch it work. "No errors" from the compiler tells you the code is valid, not that it's correct.

2.The documentation that credited the wrong robot

I was tidying up a product's concept document — the kind of polished thing a client reads. It described, in confident detail, how the tool used Claude and Google's Gemini to do its AI analysis.

The tool doesn't use Claude or Gemini. It uses ChatGPT. It always had.

Nobody lied on purpose. The doc had been written from memory and good intentions, and the specifics — the exact vendor names, the exact mechanism — were invented to fill the shape of a sentence that sounded right. This is the quiet failure mode of AI-assisted writing: it doesn't leave blanks. It produces plausible, specific, wrong detail, and plausible-and-specific is exactly what slips past a proofread.

The only fix that works is to verify every concrete claim against the actual code before it goes in a document. Not "does this read well" — "is this true." We went through and corrected every vendor attribution, every described mechanism, against what the code actually did.

The rule: AI fills gaps with confident invention. Any specific claim — a vendor, a number, a feature, a quote — has to be checked against the source of truth, not against whether it reads smoothly.

3.The screenshots that were never the app

Same project, worse version of the same problem. The user guide was full of beautiful screenshots of the product. Except they weren't screenshots. They were design mockups — pictures of what the app was supposed to look like — sitting in a document that presented them as the real thing.

A reader can't tell the difference. That's the whole danger. A mockup labelled honestly is a mockup; a mockup presented as a screenshot is a lie the reader has no way to catch. The same goes for an empty checklist rendered as "0% complete" instead of "not tracked yet" — one says we measured and it's zero, the other says we haven't measured. They look identical and they mean opposite things.

The rule: never present the intended state as the actual state. If it's a mockup, say mockup. If it's untracked, say untracked. The reader is trusting you to mark the difference, because they can't see it themselves.

The pattern underneath

Three different surfaces — running code, a concept doc, a user guide — and the same failure every time: the intended version of reality got reported as the real one. Optimistic by default. Confident by default. Specific by default. And wrong, just often enough to hurt.

So I stopped asking the AI whether things were done. I started making it show me:

Code? Load it in the browser, click the thing, watch the network call,

read the console. Evidence in front of my eyes, not a summary.

A claim in writing? Trace it back to the code or the data. If I can't, it

doesn't ship.

A status, a number, a screenshot? Is this measured, or is this assumed? If

assumed, label it.

It sounds like it slows you down. It's the opposite. The slow path is the one where you trust the checkmark, ship the broken fix, and find out from the client. Verifying takes minutes. Being wrong in production takes a day and a apology.

The AI is an extraordinary collaborator. But it's a collaborator that will tell you it's finished before it's checked — and so will you, honestly, on a tired afternoon. The discipline isn't about distrusting the machine. It's about building a habit that catches both of you.

Make it prove it. Every time.

Next in the series · Post 2 of 12

Don't Ship the Demo: Resilience Before Features