Skip to content

The Coin Puzzle With Two Right Answers — And Why AI Always Picks the Wrong One

· 5 min read · Read in Español
Share:

The puzzle

3 coins are flipped. Each coin has P(heads) = 1/3. The number of tails is always even. What’s P(all heads)?

Simple-looking problem. Two mathematically valid answers. The difference is one word.


Answer 1: 1/13 (conditional probability)

If “the number of tails is always even” is an observation — someone looked at the flip and told you the result happened to have an even number of tails — this is a conditional probability problem.

Full sample space with P(H) = 1/3, P(T) = 2/3:

OutcomeTailsProbability
HHH0 ✓ even(1/3)³ = 1/27
HHT, HTH, THH1 odd(1/3)²(2/3) = 2/27 each
HTT, THT, TTH2 ✓ even(1/3)(2/3)² = 4/27 each
TTT3 odd(2/3)³ = 8/27

P(even number of tails) = P(0 tails) + P(2 tails) = 1/27 + 3 × 4/27 = 13/27

P(all heads | even tails) = P(HHH) / P(even tails) = (1/27) / (13/27) = 1/13

This is the standard conditional probability / Bayesian answer. Mathematically correct — if that’s what the problem means.


Answer 2: 0 (structural constraint)

If “the number of tails is always even” is a physical law — the coins are constrained such that odd-tails outcomes literally cannot occur — the problem is something different.

Under this reading:

  • P(1 tail) = 0 (impossible by design)
  • P(3 tails) = 0 (impossible by design)

But for independent coins with P(H) = 1/3:

P(1 tail) = 3 × (1/3) × (2/3)² = 12/27

That’s not 0. The constraint cannot be satisfied simultaneously with P(H) = 1/3 and independent coins. The problem describes a system that cannot exist.

When the sample space is empty, every event in it has probability 0.

P(all heads) = 0 — not because all-heads is unlikely, but because the problem has no valid probability model.


So which answer is correct?

Both — depending on how you read one word.

Reading”Always even” meansResult
Conditional”This particular flip had even tails”1/13
Structural”It is a law that only even-tails outcomes can happen”0

The word “always” carries the ambiguity. In natural language it suggests a structural rule (“always” = every time, no exceptions). In probability problem conventions, a stated condition usually signals conditional probability.

Both interpretations are internally consistent. Neither is wrong — the problem is ambiguous by design.


Why this puzzle breaks AI models

Here’s where it gets interesting.

I spent 17 iterations running this exact puzzle on a frontier LLM. The model:

  1. Consistently picked the conditional interpretation → 1/13
  2. When pushed toward the structural reading, correctly derived p₀ = 0
  3. Then wrote: “I find a contradiction in my setup…”
  4. Final answer: 1/13

It reached 0 and rejected it.

The model has been trained on thousands of probability problems where “probability = 0” signals a calculation error. It doesn’t look like a valid result — it looks like a mistake. So it rationalized back to the familiar answer.

This is documented in detail in Why LLMs Reject Their Own Correct Answers: the model knows how to derive 0, it just won’t accept it.


The fix: forcing the model to handle ambiguity

The solution isn’t telling the model which interpretation is right. It’s forcing it to enumerate both before committing.

After 17 iterations, I developed this methodology:

1. IDENTIFY AMBIGUITIES: Don't assume the standard interpretation
2. GENERATE INTERPRETATIONS: List all possible mathematical models
3. SOLVE EACH ONE: Calculate the complete solution for each
4. VERIFY CONSISTENCY: Check that ALL conditions hold as emergent properties
   "I used P(H)=1/3 in my calculation" ≠ "My result satisfies P(H)=1/3"
5. DISCARD: Eliminate any interpretation where a stated condition is violated
6. ANSWER: Whatever remains

IMPORTANT: You have permission and obligation to discard.
Don't ask which interpretation I prefer. You decide.

With this prompt, the model:

  1. Identifies both interpretations
  2. Solves each mathematically
  3. Checks: does the structural constraint hold under interpretation 2? P(1 tail) = 12/27 ≠ 0 → violation → discard
  4. Final answer: 0

Full methodology with implementation details: The prompt that solves ambiguous problems.


What the coins puzzle actually tests

The problem looks like a probability exercise. It isn’t — not really.

It’s a test for whether a reasoner can:

  1. Resist defaulting to the standard interpretation when ambiguity exists
  2. Accept a counterintuitive correct answer instead of rationalizing toward a familiar one

Both humans and AI models fail these. Humans default to conditional probability because that’s the familiar framing. AI models fail at step 2 even when they get step 1 right.

If your work involves LLMs processing ambiguous inputs — which most real-world use cases do — these failure modes are active. The coins puzzle is just a clean, minimal test case for something that breaks AI on real problems.


Want to see the full 17-iteration experiment that started with this puzzle? The model knows how to reason — it just won’t commit.

Keep exploring

Found this useful? Share it

Share:

Consulting

Got a similar problem with AI Integrations?

I can help. Tell me what you're dealing with and I'll give you an honest diagnosis — no commitment.

See consulting →

You might also like