Why LLMs Reject Their Own Correct Answers

TL;DR

“Two-Box” system: separate contexts so the model reviews without bias
Problem: it reached the correct answer (0) and called it a “contradiction”
Separating contexts isn’t enough: the model won’t accept counterintuitive results
Solution: combine Two-Box with “permission to accept the unexpected”

The experiment

I designed a “two-box” system to verify LLM responses:

BOX 1: Generate response → "1/13"
       (context gets discarded)

BOX 2: [Only sees the problem + proposed answer]
       "Verify from scratch if 1/13 is correct"

The idea: if the model doesn’t see its own reasoning, it can evaluate the answer without bias.

What happened

The model in Box 2:

✅ Identified that the standard interpretation was incorrect
✅ Set up the correct equations for dependent coins
✅ Calculated p₀ = 0
❌ Wrote: “I find a contradiction…”
❌ Final answer: 1/13

It reached the correct answer and rejected it.

Why this happens

Separating contexts solves the “tokens conditioned on previous answer” problem. But there’s another issue: the model won’t accept counterintuitive results.

For the model, “probability = 0” feels like an error. It’s seen thousands of problems where the answer is a nice fraction. So it rationalizes: “there must be a contradiction in my setup.”

The fix

Two-Box needs to be combined with “permission to discard”:

IMPORTANT: If your calculation reaches a result that seems
counterintuitive (like probability = 0), THAT is the answer.
Don't call it a "contradiction." Accept it if the math says so.

Conclusion

The self-correction problem in LLMs has two layers:

Architectural: Review tokens are conditioned on context (Two-Box solves this)
Confidence: The model won’t accept the counterintuitive (requires explicit permission)

This post continues from The model knows how to reason. It just won’t commit, where I documented the 17 iterations that led to the initial discovery.

In the next experiment, I tested whether more reasoning tokens helped. Spoiler: they didn’t.

This post is part of my series on the limits of prompting. For a complete view, read my prompt engineering guide.

Keep exploring

50+ ChatGPT prompts that actually work - Practical prompts you can use today
Best free AI tools in 2026 - Where to apply these techniques
What are AI agents? - When prompts aren’t enough

Why LLMs Reject Their Own Correct Answers

TL;DR

The experiment

What happened

Why this happens

The fix

Conclusion

Keep exploring

You might also like

The model knows how to reason. It just won't commit

Prompt Engineering Guide: How to Talk to LLMs

3 Coins, Tails Always Even: Is It 0 or 1/13?