Taxonomy of LLM failures
· 3 min read
TL;DR
- 4 types of LLM failure: interpretive ambiguity, pure calculation, conceptual error, external knowledge
- Each has a different solution: v17b prompt, extended thinking, better model, web search
- Signs: ambiguity has “always/given that”, calculation errors vary, conceptual errors show false confidence
- Decision tree: identify the type first, then apply the right technique
The four types of failure
| Type | Example | Root cause |
|---|---|---|
| Interpretive ambiguity | Coins: 0 vs 1/13 | Bias toward “standard” |
| Pure calculation | Complex arithmetic | Capacity limit |
| Conceptual error | Confusing marginal with independence | Doesn’t know it doesn’t know |
| External knowledge | Data from specific papers | Doesn’t have the information |
Solution for each type
1. Interpretive ambiguity
- ✅ v17b prompt
- ✅ “Permission to discard”
- ❌ Roleplay / buffer (don’t help)
2. Pure calculation
- ✅ Models with extended thinking (Opus, o1)
- ✅ Code tools
- ❌ Elaborate prompts (get in the way)
3. Conceptual error
- ❌ No prompt solves it
- ⚠️ Specific hints can help
- ✅ More capable model
4. External knowledge
- ✅ Web search
- ⚠️ Verify extracted data
- ❌ Expecting it to “reason” the answer
How to identify the type
Signs of ambiguity:
- The problem has a word like “always,” “given that,” “it’s known that”
- There are multiple ways to model a condition
Signs of calculation:
- The model starts well but gets lost in the numbers
- Different attempts give different numerical results
Signs of conceptual error:
- The model says “this is impossible” or “there’s a contradiction”
- Confuses technical terms (marginal vs conditional, correlation vs causation)
Signs of external knowledge:
- The model invents formulas or cites papers that don’t exist
- Different models give completely different answers
Quick decision table
Does the problem have ambiguity?
→ Yes → v17b prompt
→ No ↓
Is it complex calculation?
→ Yes → Extended thinking / code
→ No ↓
Is the model saying something clearly wrong but with confidence?
→ Yes → Conceptual error. Specific hint or better model.
→ No ↓
Does it need data not in the prompt?
→ Yes → Web search + verification
→ No → Should work. If it fails, review prompt.
This taxonomy is part of my broader prompt engineering guide, which covers how to communicate effectively with LLMs. Understanding these failure types connects to the AI trends for 2026 and helps you navigate the evolving landscape of AI capabilities.
You might also like
The model knows how to reason. It just won't commit
17 prompt iterations revealed that the model finds the correct answer but self-censors for not being standard
More tokens doesn't mean better results
How an exhaustive meta-prompt caused context overflow and reached the same error on a random walk problem
The prompt that solves ambiguous problems
Practical guide to prompt v17b: a methodology for LLMs to identify and discard incorrect interpretations