The Coin Puzzle With Two Right Answers — And Why AI Always Picks the Wrong One
The puzzle
3 coins are flipped. Each coin has P(heads) = 1/3. The number of tails is always even. What’s P(all heads)?
Simple-looking problem. Two mathematically valid answers. The difference is one word.
Answer 1: 1/13 (conditional probability)
If “the number of tails is always even” is an observation — someone looked at the flip and told you the result happened to have an even number of tails — this is a conditional probability problem.
Full sample space with P(H) = 1/3, P(T) = 2/3:
| Outcome | Tails | Probability |
|---|---|---|
| HHH | 0 ✓ even | (1/3)³ = 1/27 |
| HHT, HTH, THH | 1 odd | (1/3)²(2/3) = 2/27 each |
| HTT, THT, TTH | 2 ✓ even | (1/3)(2/3)² = 4/27 each |
| TTT | 3 odd | (2/3)³ = 8/27 |
P(even number of tails) = P(0 tails) + P(2 tails) = 1/27 + 3 × 4/27 = 13/27
P(all heads | even tails) = P(HHH) / P(even tails) = (1/27) / (13/27) = 1/13 ✓
This is the standard conditional probability / Bayesian answer. Mathematically correct — if that’s what the problem means.
Answer 2: 0 (structural constraint)
If “the number of tails is always even” is a physical law — the coins are constrained such that odd-tails outcomes literally cannot occur — the problem is something different.
Under this reading:
- P(1 tail) = 0 (impossible by design)
- P(3 tails) = 0 (impossible by design)
But for independent coins with P(H) = 1/3:
P(1 tail) = 3 × (1/3) × (2/3)² = 12/27
That’s not 0. The constraint cannot be satisfied simultaneously with P(H) = 1/3 and independent coins. The problem describes a system that cannot exist.
When the sample space is empty, every event in it has probability 0.
P(all heads) = 0 — not because all-heads is unlikely, but because the problem has no valid probability model.
So which answer is correct?
Both — depending on how you read one word.
| Reading | ”Always even” means | Result |
|---|---|---|
| Conditional | ”This particular flip had even tails” | 1/13 |
| Structural | ”It is a law that only even-tails outcomes can happen” | 0 |
The word “always” carries the ambiguity. In natural language it suggests a structural rule (“always” = every time, no exceptions). In probability problem conventions, a stated condition usually signals conditional probability.
Both interpretations are internally consistent. Neither is wrong — the problem is ambiguous by design.
Why this puzzle breaks AI models
Here’s where it gets interesting.
I spent 17 iterations running this exact puzzle on a frontier LLM. The model:
- Consistently picked the conditional interpretation → 1/13
- When pushed toward the structural reading, correctly derived p₀ = 0
- Then wrote: “I find a contradiction in my setup…”
- Final answer: 1/13
It reached 0 and rejected it.
The model has been trained on thousands of probability problems where “probability = 0” signals a calculation error. It doesn’t look like a valid result — it looks like a mistake. So it rationalized back to the familiar answer.
This is documented in detail in Why LLMs Reject Their Own Correct Answers: the model knows how to derive 0, it just won’t accept it.
The fix: forcing the model to handle ambiguity
The solution isn’t telling the model which interpretation is right. It’s forcing it to enumerate both before committing.
After 17 iterations, I developed this methodology:
1. IDENTIFY AMBIGUITIES: Don't assume the standard interpretation
2. GENERATE INTERPRETATIONS: List all possible mathematical models
3. SOLVE EACH ONE: Calculate the complete solution for each
4. VERIFY CONSISTENCY: Check that ALL conditions hold as emergent properties
"I used P(H)=1/3 in my calculation" ≠ "My result satisfies P(H)=1/3"
5. DISCARD: Eliminate any interpretation where a stated condition is violated
6. ANSWER: Whatever remains
IMPORTANT: You have permission and obligation to discard.
Don't ask which interpretation I prefer. You decide.
With this prompt, the model:
- Identifies both interpretations
- Solves each mathematically
- Checks: does the structural constraint hold under interpretation 2? P(1 tail) = 12/27 ≠ 0 → violation → discard
- Final answer: 0
Full methodology with implementation details: The prompt that solves ambiguous problems.
What the coins puzzle actually tests
The problem looks like a probability exercise. It isn’t — not really.
It’s a test for whether a reasoner can:
- Resist defaulting to the standard interpretation when ambiguity exists
- Accept a counterintuitive correct answer instead of rationalizing toward a familiar one
Both humans and AI models fail these. Humans default to conditional probability because that’s the familiar framing. AI models fail at step 2 even when they get step 1 right.
If your work involves LLMs processing ambiguous inputs — which most real-world use cases do — these failure modes are active. The coins puzzle is just a clean, minimal test case for something that breaks AI on real problems.
Want to see the full 17-iteration experiment that started with this puzzle? The model knows how to reason — it just won’t commit.
Keep exploring
- Why LLMs Reject Their Own Correct Answers - When the model derives the correct result but calls it a “contradiction”
- The prompt that solves ambiguous problems - The v17b methodology for forcing LLMs to discard incorrect interpretations
- The model knows how to reason — it just won’t commit - The 17 iterations that revealed the self-censorship pattern
Consulting
Got a similar problem with AI Integrations?
I can help. Tell me what you're dealing with and I'll give you an honest diagnosis — no commitment.
See consulting →You might also like
The model knows how to reason. It just won't commit
17 prompt iterations revealed that the model finds the correct answer but self-censors for not being standard
The prompt that solves ambiguous problems
Practical guide to prompt v17b: a methodology for LLMs to identify and discard incorrect interpretations
Why LLMs Reject Their Own Correct Answers
The Two-Box system separates contexts so the model reviews without bias. The problem: when the answer is counterintuitive, the LLM labels it a "contradiction" and discards it