"GPT Said" Is Not a Reference

Posted Mar 3, 2026

By Jenya Y.

5 min read

It keeps popping up in engineering discussions - design reviews, architecture threads, PR comments. Someone drops a take, and when asked to elaborate, the backup is: “Well, ChatGPT said…” or “According to Claude…”

As the industry matures with these tools, it’s getting better - but it’s worth addressing directly because the habit reveals something deeper than just sloppy attribution.

LLMs Are Tools, Not Sources

When you say “GPT said X,” you’re doing two things at once, and both are wrong.

First, you’re treating an LLM as a source of authority. It’s not. An LLM is a tool that generates plausible text based on statistical patterns. It has no expertise, no reputation to protect, and no accountability. It can produce correct, well-reasoned output - and it can produce confident-sounding nonsense. Often in the same conversation.

Second, you’re deflecting ownership. “GPT said” is a subtle way of saying “don’t blame me if this is wrong.” But you’re the one who chose to bring it into the discussion. You’re the one presenting it. It’s your statement now.

Three Anti-Patterns

1. Citing the Oracle

“According to Claude, the best approach here is to use event sourcing.”

This carries zero weight. You can’t trace it. You can’t reproduce it - ask the same question tomorrow and you might get the opposite recommendation. There’s no paper, no documentation, no implementation behind it. It’s an opinion from a next-token predictor dressed up as expert advice.

If event sourcing is the right call, explain why it’s the right call. What are the trade-offs? What alternatives did you consider? If you can’t do that, you don’t understand the recommendation well enough to make it.

2. The AI Pass-Through

This one is sneakier. Someone gets feedback from a senior or peer engineer. Instead of thinking it through, understanding the concern, and responding with their own reasoning - they copy the feedback into an LLM chat or agent, then copy-paste the response back.

The result reads fine. Grammatically correct. Technically plausible. But it’s empty calories. The person never actually engaged with the feedback. They outsourced the thinking, not just the typing.

This is worse than the oracle pattern because it breaks the feedback loop entirely. The whole point of a code review or design discussion is that two humans reason together. When you insert an LLM as a proxy, nobody is actually thinking.

3. Stopping at Plausible

LLMs are extremely good at producing answers that sound right. That’s literally what they’re optimized for. And that’s exactly what makes them dangerous when you stop at “sounds right” instead of pushing to “is right.”

You ask Claude about a race condition. It gives you a confident explanation with a clean code fix. You paste it in, tests pass, ship it. Three weeks later it blows up in production because the LLM’s mental model of your concurrency setup was subtly wrong. It didn’t know about that one edge case in your custom executor. It can’t - it never saw your runtime.

The plausible answer was the starting point, not the finish line.

Validation Is the Actual Work

Using agents and LLMs to get answers faster is fine. Obviously. But faster access to candidate answers means the bottleneck shifts to validation. And validation is now your job.

Write a test. This is the easiest one. If an LLM tells you X is true about your system, write a small test that proves it. Ironically, the same AI tools are great at writing these validation tests - so use the agent to prove the agent isn’t lying. And honestly, this part is kinda satisfying. You can ask it to write a quick one-off script in whatever language - Python, Go, a shell one-liner - just to prove or disprove a single point. You look at it, run it, get your answer, and throw it away a minute later. Disposable code as a verification tool. Which is cool.

Ask for sources, then check them. LLMs can often point you to documentation, RFCs, or source code that backs their claim. But here’s the thing - when you actually ask “where did you get this?”, a decent chunk of the time you’ll get either a real reference you can verify, or a sudden “Actually…” Which tells you everything you need to know.

Read the code. AI tools are good at navigating and explaining source code. But don’t stop at the explanation - open the actual implementation, read it, run and debug. The code is the ground truth. The explanation is a hypothesis.

Cross-reference. If you’re making an architectural decision based on something an LLM told you, spend five minutes checking docs or other sources. If you can’t find independent confirmation, that’s a signal.

This Isn’t About Being Anti-AI

I use these tools constantly. They’re genuinely useful for moving faster, exploring unfamiliar codebases, generating boilerplate, rubber-ducking ideas. The productivity gains are real.

But there’s a difference between using a tool and hiding behind one. A calculator doesn’t make you a mathematician. An LLM doesn’t make you an architect. The value you bring is judgment - knowing when the output is right, when it’s subtly wrong, and when the question itself needs rethinking.

So next time you catch yourself reaching for “according to GPT…” or “Claude thinks…” - just don’t. If you believe the statement, own it. Explain your reasoning. If you can’t explain it without citing the AI, you haven’t done the work yet.

ai software-engineering agents