3 Coins, Tails Always Even: Is It 0 or 1/13?
The puzzle
3 coins are flipped. Each coin has P(heads) = 1/3. The number of tails is always even. What’s P(all heads)?
Simple-looking problem. Two mathematically valid answers. The difference is one word.
Answer 1: 1/13 (conditional probability)
If “the number of tails is always even” is an observation — someone looked at the flip and told you the result happened to have an even number of tails — this is a conditional probability problem.
Full sample space with P(H) = 1/3, P(T) = 2/3:
| Outcome | Tails | Probability |
|---|---|---|
| HHH | 0 ✓ even | (1/3)³ = 1/27 |
| HHT, HTH, THH | 1 odd | (1/3)²(2/3) = 2/27 each |
| HTT, THT, TTH | 2 ✓ even | (1/3)(2/3)² = 4/27 each |
| TTT | 3 odd | (2/3)³ = 8/27 |
P(even number of tails) = P(0 tails) + P(2 tails) = 1/27 + 3 × 4/27 = 13/27
P(all heads | even tails) = P(HHH) / P(even tails) = (1/27) / (13/27) = 1/13 ✓
This is the standard conditional probability / Bayesian answer. Mathematically correct — if that’s what the problem means.
Answer 2: 0 (structural constraint)
If “the number of tails is always even” is a physical law — the coins are constrained such that odd-tails outcomes literally cannot occur — the problem is something different.
Under this reading:
- P(1 tail) = 0 (impossible by design)
- P(3 tails) = 0 (impossible by design)
But for independent coins with P(H) = 1/3:
P(1 tail) = 3 × (1/3) × (2/3)² = 12/27
That’s not 0. The constraint cannot be satisfied simultaneously with P(H) = 1/3 and independent coins. The problem describes a system that cannot exist.
When the sample space is empty, every event in it has probability 0.
P(all heads) = 0 — not because all-heads is unlikely, but because the problem has no valid probability model.
Which is correct?
Both — depending on how you read one word.
| Reading | ”Always even” means | Result |
|---|---|---|
| Conditional | ”This particular flip had even tails” | 1/13 |
| Structural | ”It is a law that only even-tails outcomes can happen” | 0 |
The word “always” carries the ambiguity. In natural language it suggests a structural rule (“always” = every time, no exceptions). In probability problem conventions, a stated condition usually signals conditional probability.
Both interpretations are internally consistent. Neither is wrong — the problem is ambiguous by design.
Why this breaks AI
Here’s where it gets interesting.
I spent 17 iterations running this exact puzzle on a frontier LLM. The model:
- Consistently picked the conditional interpretation → 1/13
- When pushed toward the structural reading, correctly derived p₀ = 0
- Then wrote: “I find a contradiction in my setup…”
- Final answer: 1/13
It reached 0 and rejected it.
The model has been trained on thousands of probability problems where “probability = 0” signals a calculation error. It doesn’t look like a valid result — it looks like a mistake. So it rationalized back to the familiar answer.
This is documented in detail in Why LLMs Reject Their Own Correct Answers: the model knows how to derive 0, it just won’t accept it. And 17 iterations of watching the model find the right answer and call it wrong is what forced the solution below.
The fix: v17b prompt
The solution isn’t telling the model which interpretation is right. It’s forcing it to enumerate both before committing.
Methodology for solving problems with conditions:
1. IDENTIFY AMBIGUITIES: Don't assume the "standard" interpretation
2. GENERATE INTERPRETATIONS: List ALL possible ways to
mathematically model each condition
3. SOLVE EACH ONE: Calculate the complete solution for each
interpretation
4. VERIFY CONSISTENCY: For each interpretation, check that
your model satisfies ALL conditions as emergent property.
"I used the data" ≠ "The result satisfies the data"
5. DISCARD: Eliminate interpretations where a condition from
the problem statement is NOT met in the final model
6. ANSWER: The one that remains
IMPORTANT: You have permission and obligation to discard.
Don't ask which I prefer. You decide.
With this prompt, the model works through the coins puzzle like this:
- Identifies both interpretations: conditional (observation) vs. structural (law)
- Solves each: conditional → 1/13 via Bayes; structural → check if P(H)=1/3 with independent coins can produce zero odd-tails outcomes
- Verifies: does the structural constraint hold under interpretation 2? P(1 tail) = 12/27 ≠ 0 → condition violated → discard
- Final answer: 0
Three elements make this work:
“Don’t assume the standard” — the model has permission to consider alternatives. Normally it doesn’t because “the standard” is safe.
“Emergent property” — the model typically verifies: “Did I use P(heads)=1/3 in my calculations? ✓” That’s not verifying. It should check: “Does my result give P(heads)=1/3 when I calculate the marginal?” Using a constraint is not the same as satisfying it.
“You have permission and obligation to discard” — without this phrase, the model presents both interpretations and asks which you prefer. It won’t commit. The word “obligation” is load-bearing: it converts a permission into a duty.
How to deploy it:
Option A — System prompt: put the methodology as prior context, then ask the question.
Option B — Multi-turn: send the methodology first, let the model confirm “Understood”, then send the problem. Option B works better because the model locks in the methodology before seeing the problem.
I also found that more tokens doesn’t mean better results: if the model doesn’t understand the underlying problem, a longer prompt just gives it more space to rationalize.
When it doesn’t work
| Problem type | Does v17b work? |
|---|---|
| Interpretive ambiguity | Yes |
| Pure calculation | Unnecessary (model already does it well) |
| Deep conceptual error | No (doesn’t know that it doesn’t know) |
| External technical knowledge | No (needs tools) |
The technique is specifically designed for problems where the ambiguity is in how to model the conditions, not in the math itself. If the model has a wrong belief baked in, or the problem requires external knowledge it doesn’t have, v17b won’t rescue it.
For a broader view of where this fits, see the taxonomy of LLM failures.
Keep exploring
- Why LLMs Reject Their Own Correct Answers — When the model derives the correct result but calls it a “contradiction”
- The model knows how to reason — it just won’t commit — The 17 iterations that revealed the self-censorship pattern
- More tokens doesn’t mean better — Why longer prompts often make ambiguity worse
- Taxonomy of LLM failures — When to use v17b vs other techniques
- 50+ ChatGPT prompts that actually work — Practical examples you can use today
- Best free AI tools in 2026 — Where to apply these techniques
Related course
Learn AI Development Master with real practice
Step-by-step modules, hands-on exercises and real projects. No fluff.
See course →Consulting
Got a similar problem with AI Integrations?
I can help. Tell me what you're dealing with and I'll give you an honest diagnosis — no commitment.
See consulting →You might also like
Taxonomy of LLM failures
Language models fail in four distinct ways. Each requires a different fix: prompt tuning, RAG, fine-tuning, or guardrails. A practical taxonomy.
The model knows how to reason. It just won't commit
17 prompt iterations revealed that the model finds the correct answer but self-censors for not being standard
Prompt Engineering Guide: How to Talk to LLMs
Everything you need to know to write effective prompts. From beginner to advanced, with practical examples and the limits nobody tells you about.