Prompt Engineering Guide: How to Talk to LLMs
TL;DR
- Prompting isn’t magic—it’s structured communication
- The 5 elements: context, instruction, format, examples, constraints
- Advanced techniques (chain of thought, few-shot) have their place
- Prompting has real limits you need to know about
- The skill develops through iteration, not memorizing templates
What is Prompt Engineering?
Prompt engineering is the art of communicating with language models in a way that gets you the result you need. It’s not hacking the model. It’s not finding magic words. It’s simply learning to give clear instructions.
Think of it like explaining a task to a very capable colleague who doesn’t know your context. If you say “do the report,” you probably won’t get what you want. If you say “I need a 2-page report on Q4 sales, with charts, to present to the director tomorrow,” the result will be much better.
Why it matters
The difference between a mediocre prompt and a good one can be:
| Vague prompt | Structured prompt |
|---|---|
| ”Write something about AI” | Generic 500-word article |
| ”Write a 1000-word article on AI risks in finance, with 3 real cases and practical recommendations, in a professional but accessible tone” | Specific, useful content |
Same model. Output quality depends on input quality.
What it’s NOT
- Not magic: There are no secret words that unlock superpowers
- Not manipulation: The model has no ego to flatter
- Not universal: What works for one task may not work for another
- Not a silver bullet: There are real limits (more on this later)
The 5 Elements of a Good Prompt
1. Context
Tell the model who you are and what situation you’re facing. Context helps the model calibrate its response.
Bad:
How do I optimize a SQL query?
Better:
I'm a junior data engineer working with PostgreSQL.
I have a query that takes 30 seconds on a 10M row table.
How can I optimize it?
Context doesn’t need to be long. It needs to be relevant.
2. Clear instruction
The instruction is the core of the prompt. It should have:
- Action verb: Write, analyze, compare, summarize, explain
- Object: What you want processed
- Constraints: Limits and conditions
Basic structure:
[Verb] + [what] + [how] + [constraints]
Example:
Analyze this Python code
identifying potential performance issues
and suggest optimizations
without changing the business logic.
3. Output format
If you don’t specify format, the model will choose one. Sometimes it’ll guess right. Sometimes it won’t.
Specify format when you need it:
Respond in JSON format with this structure:
{
"summary": "...",
"key_points": ["...", "..."],
"next_step": "..."
}
Respond with a markdown table comparing:
| Feature | Option A | Option B |
Respond in 3 paragraphs maximum, no lists.
4. Examples (Few-shot)
Showing examples is more effective than explaining rules. The model learns from patterns.
Without examples:
Classify these tweets by sentiment: positive, negative, neutral.
With examples (few-shot):
Classify these tweets by sentiment.
Examples:
- "I love this product!" → positive
- "Terrible service, never again" → negative
- "Package arrived today" → neutral
Now classify:
- "It's not bad, but I expected more"
- "Amazing experience!"
Rule of thumb: 2-3 examples are usually enough. More examples isn’t always better.
5. Constraints
Tell the model what NOT to do. This is as important as telling it what to do.
Useful constraints:
- Don't make up data. If you don't know something, say "I don't have that information."
- Don't use technical jargon; the audience is non-technical.
- Don't include code; just explain the concept.
- Stick to facts from the attached document only.
Advanced Techniques
Chain of Thought (CoT)
Ask the model to reason step by step. Useful for complex problems where reasoning matters.
Without CoT:
How many days between March 15, 2024 and June 22, 2024?
With CoT:
How many days between March 15, 2024 and June 22, 2024?
Think step by step, counting days in each month.
The CoT version is slower but more accurate for calculations and logic.
Role Playing
Assign a specific role to the model. Useful when you need a particular perspective.
You are a senior code reviewer with 15 years of Python experience.
Your job is to find subtle bugs and maintainability issues.
Be critical but constructive.
Warning: Roleplay doesn’t make the model more capable. A model that doesn’t know advanced math won’t learn it by pretending to be a mathematician. Roleplay adjusts tone and focus, not capabilities.
Few-shot with variations
When the pattern is complex, show variations in examples:
Convert these descriptions into Python variable names.
Examples:
- "the user's name" → user_name
- "total number of attempts" → total_attempts
- "is active or not" → is_active
- "list of favorite products" → favorite_products
Now convert:
- "last modification date"
- "has admin permissions"
Self-consistency
Ask the model to verify its own answer. Works better than you’d think.
Solve this problem.
Then verify your answer by substituting the values
and check that the result is consistent.
If you find an error, correct it.
The Limits of Prompting
This is where most guides fall short. Prompting has real limits, and knowing them will save you frustration.
My experiment: 17 iterations
After testing a probability problem with 17 different prompt versions, I discovered something that changed how I see prompting:
The model found the correct answer in its reasoning… and then discarded it for “not being standard.”
It wasn’t a prompt problem. The model KNEW the answer but didn’t dare give it. This led me to develop a taxonomy of failures that determines which technique to use in each case.
The 4 Types of Failure
| Failure type | Example | Solution |
|---|---|---|
| Interpretive ambiguity | Problem has multiple valid readings | Give explicit permission to explore alternatives |
| Pure calculation | Complex arithmetic, many steps | Extended thinking or code tools |
| Conceptual error | Model confuses technical concepts | More capable model or specific hints |
| External knowledge | Needs data it doesn’t have | Web search + verification |
The key: Diagnose the failure type before modifying the prompt. A more elaborate prompt for a type 2 problem (calculation) doesn’t help; it gets in the way.
The complete case study
I documented the entire process in a series of posts:
- The model knows how to reason. It just won’t commit - The initial discovery
- It got to 0 and called it a contradiction - Why separating contexts isn’t enough
- More tokens isn’t a better result - The limits of brute force
- The prompt that solves ambiguous problems - The solution: prompt v17b
- Taxonomy of LLM failures - When to use each technique
If you want to dive deep into prompting limits, that series will give you a perspective you won’t find in generic tutorials.
Common Mistakes and How to Avoid Them
1. Being too vague
Bad: “Help me with my code” Good: “I have a TypeError in this Python function. The message is ‘NoneType has no attribute split’. Why is this happening and how do I fix it?“
2. Giving too many instructions
Kilometer-long prompts confuse more than they help. If your prompt has 15 instructions, you probably need to split the task.
Bad: A 500-word prompt with 12 conditions Good: Split into 2-3 sequential steps
3. Not specifying format
If you need JSON, ask for JSON. If you need bullet points, ask for them. The model doesn’t read minds.
4. Expecting the model to “guess”
The model optimizes for giving a plausible answer, not for asking what you meant. If there’s ambiguity, it’ll resolve it its own way.
Bad: “Do an analysis” (of what? how deep? for whom?) Good: “Do a SWOT analysis of this startup, one page, to present to investors”
5. Not iterating
Prompting is iterative. Your first prompt will almost never be final. Read the response, identify what’s missing or extra, and adjust.
Normal flow:
- Initial prompt → partially useful response
- Adjust instructions → better but missing format
- Add format → almost perfect but too long
- Add length constraint → done
Tools and Resources
Claude Projects
If you use Claude, Projects let you save persistent context: documents, instructions, and constraints that apply to the entire conversation.
Useful for:
- Long projects where you repeat the same context
- Specific writing styles
- Reference documentation
System prompts
The system prompt is context the model “remembers” throughout the conversation. Use it for:
You are a programming assistant specialized in Python.
When giving code, always include explanatory comments.
Use type hints.
Prefer simple solutions over elegant ones.
Templates for common cases
For text analysis:
Analyze the following text and extract:
1. Main topic
2. Tone (formal/informal/technical)
3. 3 key points
4. Possible biases or limitations
Text:
[paste text]
For code review:
Review this code looking for:
1. Bugs or logic errors
2. Performance issues
3. Best practice violations
4. Improvement suggestions
Prioritize by impact. Don't comment on minor style issues.
Code:
[paste code]
For summaries:
Summarize this document as:
- 1 context sentence
- 3-5 bullet points with key takeaways
- 1 conclusion sentence
Maximum 200 words total.
Document:
[paste document]
Conclusion
Prompting is a skill developed through practice, not memorizing templates. The 5 elements (context, instruction, format, examples, constraints) are your foundation. Advanced techniques are specific tools for specific problems.
But more importantly: know the limits. Knowing when the problem ISN’T the prompt will save you hours of frustration. Sometimes you need a more capable model. Sometimes you need external tools. Sometimes the model simply can’t do what you’re asking.
Good prompt engineering isn’t writing the perfect prompt. It’s knowing which prompt to write for each situation.
Want practical examples to copy and use? Check out my 50 tested prompts for ChatGPT—they work on any LLM. Looking for the right AI to apply these techniques to? My comparison of ChatGPT, Claude, and Gemini will help you choose. And if you’re coding with AI, Cursor uses prompting in the context of your entire codebase.
You might also like
Taxonomy of LLM failures
The four types of errors in language models and which technique to use for each
The model knows how to reason. It just won't commit
17 prompt iterations revealed that the model finds the correct answer but self-censors for not being standard
More tokens doesn't mean better results
How an exhaustive meta-prompt caused context overflow and reached the same error on a random walk problem