The model knows how to reason. It just won't commit
TL;DR
- The model finds the correct answer but discards it for “not being standard”
- Three keys of the winning prompt: permission for alternatives, verify as emergent property, obligation to choose
- Self-questioning fails because review tokens are conditioned on the previous answer
- Prompt engineering isn’t magic: it’s unblocking what’s already there
The discovery
I tested a probability problem where the correct answer is 0 and the common incorrect one is 1/13. After 17 prompt iterations, I discovered something:
The model DID find the correct interpretation. And discarded it.
In its “thoughts”:
“What if the coins aren’t independent? … No, that’s the standard.”
It saw the alternative. Evaluated it. Rejected it for not being “normal.”
The three ingredients of the winning prompt
1. Permission to look for alternatives
"Don't assume the standard interpretation"
Without this, the model self-censors. “The standard” is safe.
2. Obligation to verify as emergent property
"Does your model satisfy ALL conditions as a result,
not as input?"
“I used the number 1/3” ≠ “The result gives 1/3”
3. Permission to choose
"You have permission and obligation to discard.
Don't ask which I prefer. You decide."
Without this, it presents options and waits for the human to choose.
What DIDN’T work
| Technique | Why it failed |
|---|---|
| Roleplay (“Dr. Rigor”) | 10M Monte Carlo simulations… verifying the wrong answer |
| DRAFT/REVISION buffer | Completed the format, same answer |
| ”Be critical of yourself" | "Is it correct? Yes ✓“ |
| Hostile scientist | Can’t be hostile to itself |
The architectural cause
The problem goes deeper than prompting. Once the model writes “1/13”, the revision tokens are conditioned on that context:
Token t: "1/13"
Token t+1: "Verifying..." (conditioned on "...is 1/13")
Token t+2: "correct" (MORE LIKELY than "incorrect")
It can’t “un-see” its own answer. That’s why self-questioning fails.
Conclusion
The problem isn’t that the model can’t reason. It’s that:
- It doesn’t know it should look for alternatives
- It won’t commit without permission
- It confuses “verify” with “confirm”
Understanding effective prompt engineering isn’t magic. It’s unblocking what’s already there.
In my next experiment, I tried separating contexts with the “Two-Box” system and discovered a second layer to the problem. The final prompt that came out of these 17 iterations is documented in the v17b prompt.
Keep exploring
- 50+ ChatGPT prompts that actually work - Practical prompts you can use today
- Best free AI tools in 2026 - Where to apply these techniques
- LLM failure taxonomy - Understanding when and why LLMs fail
You might also like
Taxonomy of LLM failures
The four types of errors in language models and which technique to use for each
More tokens doesn't mean better results
How an exhaustive meta-prompt caused context overflow and reached the same error on a random walk problem
The prompt that solves ambiguous problems
Practical guide to prompt v17b: a methodology for LLMs to identify and discard incorrect interpretations