The Coin Puzzle With Two Right Answers — And Why AI Always Picks the Wrong One

The puzzle

3 coins are flipped. Each coin has P(heads) = 1/3. The number of tails is always even. What’s P(all heads)?

Simple-looking problem. Two mathematically valid answers. The difference is one word.

Answer 1: 1/13 (conditional probability)

If “the number of tails is always even” is an observation — someone looked at the flip and told you the result happened to have an even number of tails — this is a conditional probability problem.

Full sample space with P(H) = 1/3, P(T) = 2/3:

Outcome	Tails	Probability
HHH	0 ✓ even	(1/3)³ = 1/27
HHT, HTH, THH	1 odd	(1/3)²(2/3) = 2/27 each
HTT, THT, TTH	2 ✓ even	(1/3)(2/3)² = 4/27 each
TTT	3 odd	(2/3)³ = 8/27

P(even number of tails) = P(0 tails) + P(2 tails) = 1/27 + 3 × 4/27 = 13/27

P(all heads | even tails) = P(HHH) / P(even tails) = (1/27) / (13/27) = 1/13 ✓

This is the standard conditional probability / Bayesian answer. Mathematically correct — if that’s what the problem means.

Answer 2: 0 (structural constraint)

If “the number of tails is always even” is a physical law — the coins are constrained such that odd-tails outcomes literally cannot occur — the problem is something different.

Under this reading:

P(1 tail) = 0 (impossible by design)
P(3 tails) = 0 (impossible by design)

But for independent coins with P(H) = 1/3:

P(1 tail) = 3 × (1/3) × (2/3)² = 12/27

That’s not 0. The constraint cannot be satisfied simultaneously with P(H) = 1/3 and independent coins. The problem describes a system that cannot exist.

When the sample space is empty, every event in it has probability 0.

P(all heads) = 0 — not because all-heads is unlikely, but because the problem has no valid probability model.

So which answer is correct?

Both — depending on how you read one word.

Reading	”Always even” means	Result
Conditional	”This particular flip had even tails”	1/13
Structural	”It is a law that only even-tails outcomes can happen”	0

The word “always” carries the ambiguity. In natural language it suggests a structural rule (“always” = every time, no exceptions). In probability problem conventions, a stated condition usually signals conditional probability.

Both interpretations are internally consistent. Neither is wrong — the problem is ambiguous by design.

Why this puzzle breaks AI models

Here’s where it gets interesting.

I spent 17 iterations running this exact puzzle on a frontier LLM. The model:

Consistently picked the conditional interpretation → 1/13
When pushed toward the structural reading, correctly derived p₀ = 0
Then wrote: “I find a contradiction in my setup…”
Final answer: 1/13

It reached 0 and rejected it.

The model has been trained on thousands of probability problems where “probability = 0” signals a calculation error. It doesn’t look like a valid result — it looks like a mistake. So it rationalized back to the familiar answer.

This is documented in detail in Why LLMs Reject Their Own Correct Answers: the model knows how to derive 0, it just won’t accept it.

The fix: forcing the model to handle ambiguity

The solution isn’t telling the model which interpretation is right. It’s forcing it to enumerate both before committing.

After 17 iterations, I developed this methodology:

1. IDENTIFY AMBIGUITIES: Don't assume the standard interpretation
2. GENERATE INTERPRETATIONS: List all possible mathematical models
3. SOLVE EACH ONE: Calculate the complete solution for each
4. VERIFY CONSISTENCY: Check that ALL conditions hold as emergent properties
   "I used P(H)=1/3 in my calculation" ≠ "My result satisfies P(H)=1/3"
5. DISCARD: Eliminate any interpretation where a stated condition is violated
6. ANSWER: Whatever remains

IMPORTANT: You have permission and obligation to discard.
Don't ask which interpretation I prefer. You decide.

With this prompt, the model:

Identifies both interpretations
Solves each mathematically
Checks: does the structural constraint hold under interpretation 2? P(1 tail) = 12/27 ≠ 0 → violation → discard
Final answer: 0

Full methodology with implementation details: The prompt that solves ambiguous problems.

What the coins puzzle actually tests

The problem looks like a probability exercise. It isn’t — not really.

It’s a test for whether a reasoner can:

Resist defaulting to the standard interpretation when ambiguity exists
Accept a counterintuitive correct answer instead of rationalizing toward a familiar one

Both humans and AI models fail these. Humans default to conditional probability because that’s the familiar framing. AI models fail at step 2 even when they get step 1 right.

If your work involves LLMs processing ambiguous inputs — which most real-world use cases do — these failure modes are active. The coins puzzle is just a clean, minimal test case for something that breaks AI on real problems.

Want to see the full 17-iteration experiment that started with this puzzle? The model knows how to reason — it just won’t commit.

Keep exploring

Why LLMs Reject Their Own Correct Answers - When the model derives the correct result but calls it a “contradiction”
The prompt that solves ambiguous problems - The v17b methodology for forcing LLMs to discard incorrect interpretations
The model knows how to reason — it just won’t commit - The 17 iterations that revealed the self-censorship pattern

The Coin Puzzle With Two Right Answers — And Why AI Always Picks the Wrong One

The puzzle

Answer 1: 1/13 (conditional probability)

Answer 2: 0 (structural constraint)

So which answer is correct?

Why this puzzle breaks AI models

The fix: forcing the model to handle ambiguity

What the coins puzzle actually tests

Keep exploring

You might also like

The model knows how to reason. It just won't commit

The prompt that solves ambiguous problems

Why LLMs Reject Their Own Correct Answers