The model knows how to reason. It just won't commit

TL;DR

The model finds the correct answer but discards it for “not being standard”
Three keys of the winning prompt: permission for alternatives, verify as emergent property, obligation to choose
Self-questioning fails because review tokens are conditioned on the previous answer
Prompt engineering isn’t magic: it’s unblocking what’s already there

The discovery

I tested a probability problem where the correct answer is 0 and the common incorrect one is 1/13. After 17 prompt iterations, I discovered something:

The model DID find the correct interpretation. And discarded it.

In its “thoughts”:

“What if the coins aren’t independent? … No, that’s the standard.”

It saw the alternative. Evaluated it. Rejected it for not being “normal.”

The three ingredients of the winning prompt

1. Permission to look for alternatives

"Don't assume the standard interpretation"

Without this, the model self-censors. “The standard” is safe.

2. Obligation to verify as emergent property

"Does your model satisfy ALL conditions as a result,
not as input?"

“I used the number 1/3” ≠ “The result gives 1/3”

3. Permission to choose

"You have permission and obligation to discard.
Don't ask which I prefer. You decide."

Without this, it presents options and waits for the human to choose.

What DIDN’T work

Technique	Why it failed
Roleplay (“Dr. Rigor”)	10M Monte Carlo simulations… verifying the wrong answer
DRAFT/REVISION buffer	Completed the format, same answer
”Be critical of yourself"	"Is it correct? Yes ✓“
Hostile scientist	Can’t be hostile to itself

The architectural cause

The problem goes deeper than prompting. Once the model writes “1/13”, the revision tokens are conditioned on that context:

Token t:   "1/13"
Token t+1: "Verifying..." (conditioned on "...is 1/13")
Token t+2: "correct" (MORE LIKELY than "incorrect")

It can’t “un-see” its own answer. That’s why self-questioning fails.

Conclusion

The problem isn’t that the model can’t reason. It’s that:

It doesn’t know it should look for alternatives
It won’t commit without permission
It confuses “verify” with “confirm”

Understanding effective prompt engineering isn’t magic. It’s unblocking what’s already there.

In my next experiment, I tried separating contexts with the “Two-Box” system and discovered a second layer to the problem. The final prompt that came out of these 17 iterations is documented in the v17b prompt.

Keep exploring

50+ ChatGPT prompts that actually work - Practical prompts you can use today
Best free AI tools in 2026 - Where to apply these techniques
LLM failure taxonomy - Understanding when and why LLMs fail

The model knows how to reason. It just won't commit

TL;DR

The discovery

The three ingredients of the winning prompt

What DIDN’T work

The architectural cause

Conclusion

Keep exploring

You might also like

Taxonomy of LLM failures

More tokens doesn't mean better results

The prompt that solves ambiguous problems