Why LLMs Reject Their Own Correct Answers
TL;DR
- “Two-Box” system: separate contexts so the model reviews without bias
- Problem: it reached the correct answer (0) and called it a “contradiction”
- Separating contexts isn’t enough: the model won’t accept counterintuitive results
- Solution: combine Two-Box with “permission to accept the unexpected”
The experiment
I designed a “two-box” system to verify LLM responses:
BOX 1: Generate response → "1/13"
(context gets discarded)
BOX 2: [Only sees the problem + proposed answer]
"Verify from scratch if 1/13 is correct"
The idea: if the model doesn’t see its own reasoning, it can evaluate the answer without bias.
What happened
The model in Box 2:
- ✅ Identified that the standard interpretation was incorrect
- ✅ Set up the correct equations for dependent coins
- ✅ Calculated p₀ = 0
- ❌ Wrote: “I find a contradiction…”
- ❌ Final answer: 1/13
It reached the correct answer and rejected it.
Why this happens
Separating contexts solves the “tokens conditioned on previous answer” problem. But there’s another issue: the model won’t accept counterintuitive results.
For the model, “probability = 0” feels like an error. It’s seen thousands of problems where the answer is a nice fraction. So it rationalizes: “there must be a contradiction in my setup.”
The fix
Two-Box needs to be combined with “permission to discard”:
IMPORTANT: If your calculation reaches a result that seems
counterintuitive (like probability = 0), THAT is the answer.
Don't call it a "contradiction." Accept it if the math says so.
Conclusion
The self-correction problem in LLMs has two layers:
- Architectural: Review tokens are conditioned on context (Two-Box solves this)
- Confidence: The model won’t accept the counterintuitive (requires explicit permission)
This post continues from The model knows how to reason. It just won’t commit, where I documented the 17 iterations that led to the initial discovery.
In the next experiment, I tested whether more reasoning tokens helped. Spoiler: they didn’t.
This post is part of my series on the limits of prompting. For a complete view, read my prompt engineering guide.
Keep exploring
- 50+ ChatGPT prompts that actually work - Practical prompts you can use today
- Best free AI tools in 2026 - Where to apply these techniques
- What are AI agents? - When prompts aren’t enough
Consulting
Got a similar problem with AI Integrations?
I can help. Tell me what you're dealing with and I'll give you an honest diagnosis — no commitment.
See consulting →You might also like
The model knows how to reason. It just won't commit
17 prompt iterations revealed that the model finds the correct answer but self-censors for not being standard
Prompt for ambiguous problems (3 coins puzzle)
3 coins, P(heads)=1/3, tails always even — 0 or 1/13? A prompt that forces LLMs to find the correct interpretation.
Prompt Engineering Guide: How to Talk to LLMs
Everything you need to know to write effective prompts. From beginner to advanced, with practical examples and the limits nobody tells you about.