Taxonomy of LLM failures

TL;DR

4 types of LLM failure: interpretive ambiguity, pure calculation, conceptual error, external knowledge
Each has a different solution: v17b prompt, extended thinking, better model, web search
Signs: ambiguity has “always/given that”, calculation errors vary, conceptual errors show false confidence
Decision tree: identify the type first, then apply the right technique

The four types of failure

Type	Example	Root cause
Interpretive ambiguity	Coins: 0 vs 1/13	Bias toward “standard”
Pure calculation	Complex arithmetic	Capacity limit
Conceptual error	Confusing marginal with independence	Doesn’t know it doesn’t know
External knowledge	Data from specific papers	Doesn’t have the information

Solution for each type

1. Interpretive ambiguity

✅ v17b prompt
✅ “Permission to discard”
❌ Roleplay / buffer (don’t help)

2. Pure calculation

✅ Models with extended thinking (Opus, o1)
✅ Code tools
❌ Elaborate prompts (get in the way)

3. Conceptual error

❌ No prompt solves it
⚠️ Specific hints can help
✅ More capable model

4. External knowledge

✅ Web search
⚠️ Verify extracted data
❌ Expecting it to “reason” the answer

How to identify the type

Signs of ambiguity:

The problem has a word like “always,” “given that,” “it’s known that”
There are multiple ways to model a condition

Signs of calculation:

The model starts well but gets lost in the numbers
Different attempts give different numerical results

Signs of conceptual error:

The model says “this is impossible” or “there’s a contradiction”
Confuses technical terms (marginal vs conditional, correlation vs causation)

Signs of external knowledge:

The model invents formulas or cites papers that don’t exist
Different models give completely different answers

Quick decision table

Does the problem have ambiguity?
  → Yes → v17b prompt
  → No ↓

Is it complex calculation?
  → Yes → Extended thinking / code
  → No ↓

Is the model saying something clearly wrong but with confidence?
  → Yes → Conceptual error. Specific hint or better model.
  → No ↓

Does it need data not in the prompt?
  → Yes → Web search + verification
  → No → Should work. If it fails, review prompt.

This taxonomy is part of my broader prompt engineering guide, which covers how to communicate effectively with LLMs. Understanding these failure types connects to the AI trends for 2026 and helps you navigate the evolving landscape of AI capabilities.