PromptWright — Build & Test AI Prompts

# Few-Shot Prompting: How to Teach AI With Examples for Better Results

Sometimes the clearest way to explain what you want is not to describe it, but to show it. Few-shot prompting takes advantage of this by including a small number of input-output examples directly in your prompt, demonstrating the pattern you want the AI to follow. Instead of trying to explain a format, tone, classification scheme, or transformation in abstract terms, you give the model two or three concrete examples and let it generalize from there. The result is almost always more accurate, more consistent, and faster to produce than a zero-shot prompt that relies purely on instructions.

This guide covers everything you need to know about few-shot prompting: what it is, when to use it, how to structure effective examples, how many examples to include, and the common mistakes that undermine otherwise good prompts. You'll come away with templates and techniques you can apply immediately.

## What Is Few-Shot Prompting?

Few-shot prompting is the practice of including a small number of worked examples (the "shots") in your prompt to show the model the exact input-to-output pattern you want. The model uses these examples to infer the task and produce output that matches the demonstrated pattern.

The terminology comes from machine learning:

- **Zero-shot prompting:** You describe the task in words with no examples. "Translate the following sentence to French: [sentence]."
- **One-shot prompting:** You include one example. "Translate to French. English: Hello. French: Bonjour. Now translate: [sentence]."
- **Few-shot prompting:** You include two to several examples that demonstrate the pattern.

Few-shot prompting sits in a sweet spot: it gives the model enough signal to reliably reproduce a pattern without requiring massive training data or fine-tuning. For most tasks, two to five well-chosen examples are enough to dramatically improve output quality.

## Why Few-Shot Prompting Works

Understanding why few-shot prompting works helps you use it more effectively. It's not magic — it's a consequence of how modern language models are trained and how they process context.

### Models Learn Patterns From Context

Language models are trained to predict the next token given preceding context. When you place examples in the prompt, the model treats them as context and naturally continues the pattern. If every example shows "Input: X → Output: Y," the model sees that pattern repeated and extends it to your new input. You're not "teaching" the model in the training sense — you're demonstrating a pattern it can recognize and continue.

### Examples Are Less Ambiguous Than Instructions

Instructions rely on the model interpreting words the same way you do. "Formal tone" means different things to different people, and the model's interpretation might not match yours. An example of formal tone removes ambiguity — the model sees exactly what you mean. This is why few-shot prompting is especially powerful for subjective or hard-to-define outputs like tone, style, formatting, and classification boundaries.

### Examples Constrain the Output Space

Without examples, the model has many valid-seeming ways to respond, and it may not pick the one you want. Examples narrow the space of acceptable responses by showing the model the specific format, length, and style you're looking for. This is why few-shot is so effective for structured output: the model sees that outputs are, say, always three bullet points under 15 words each, and it reliably reproduces that structure.

### Examples Encode Edge Cases

If you only describe a task in words, the model has to guess how to handle edge cases it hasn't been explicitly told about. Examples let you demonstrate edge cases directly — include an example of a tricky input and show the correct handling. The model generalizes from the demonstrated handling to similar cases.

## When to Use Few-Shot Prompting

Few-shot prompting isn't the right tool for every task. Here's how to decide.

### Use Few-Shot Prompting For:

- **Classification tasks:** Sentiment analysis, intent detection, spam filtering, category assignment. Examples make boundaries clear and consistent.
- **Formatting and transformation:** Converting data from one format to another (text to JSON, unstructured to structured, one style to another). Examples show the exact target format.
- **Style and tone control:** Matching a brand voice, formality level, or writing style. Examples remove ambiguity that instructions can't fully resolve.
- **Pattern extraction:** Pulling specific types of information out of text (entities, relationships, key phrases). Examples demonstrate what counts as relevant.
- **Consistency-sensitive tasks:** Any task where you need the model to apply the same criteria the same way across many inputs. Examples lock in the criteria.

### Skip Few-Shot For:

- **Simple, unambiguous tasks:** "Summarize this in one sentence" or "Translate this to Spanish" — these are well-established tasks where zero-shot works reliably. Adding examples adds token cost without improving quality.
- **Creative generation with wide latitude:** If you want creative variety (brainstorming ideas, story generation), overly specific examples can make the model copy them too closely, reducing originality.
- **Tasks where instructions are clearer than examples:** Some tasks are easier to describe precisely in words than to demonstrate. If your instructions are unambiguous and the model follows them reliably, don't add examples for the sake of having them.

## How to Structure a Few-Shot Prompt

The structure of your few-shot prompt matters as much as the examples themselves. A well-structured prompt makes the pattern obvious; a poorly structured one confuses the model.

### The Basic Structure

A few-shot prompt has three parts:

1. **Optional instruction:** A brief statement of what the task is. Not always needed — the examples can speak for themselves — but a short instruction sets context.
2. **Examples:** Two to five input-output pairs that demonstrate the pattern.
3. **The actual input:** The new input you want the model to process, formatted exactly like the inputs in the examples.

### Format Consistency Is Critical

The most important principle: **format the examples consistently and make the actual input match that format exactly.** If your examples show "Input: [text] → Output: [text]" but your actual input is just raw text with no "Input:" prefix, the model may get confused. The format of the new input should look like an incomplete example that the model needs to complete.

Here's a well-structured few-shot prompt:

```
Classify the sentiment of each review as Positive, Negative, or Neutral.

Review: "I love this product, it works perfectly."
Classification: Positive

Review: "The item arrived broken and customer service ignored me."
Classification: Negative

Review: "It does what it says. Nothing special, nothing wrong."
Classification: Neutral

Review: "Amazing quality for the price, would buy again."
Classification:
```

Note how the final input mirrors the example format exactly, and the model's job is to complete the "Classification:" line. This structure makes the pattern unmistakable.

### Choose a Clear Delimiter

Use a consistent delimiter between examples. Blank lines, numbered prefixes, or explicit "Input/Output" labels all work. What matters is consistency. If you use "Input:" and "Output:" labels in the examples, use the same labels for the real input. Mixed delimiters force the model to infer a pattern from inconsistent context, which degrades accuracy.

## How Many Examples to Include

More examples aren't always better. The right number depends on the task, but here are practical guidelines:

### Two to Three Examples for Most Tasks

For most classification, formatting, and style tasks, two or three well-chosen examples capture the pattern. More examples add token cost and can introduce noise without meaningfully improving accuracy. Start with two and add a third only if you see inconsistent results.

### Use More Examples When:

- **The pattern is complex:** Multi-step transformations, intricate formats, or tasks with many constraints benefit from more examples showing different cases.
- **Edge cases matter:** If you want the model to handle unusual inputs correctly, include examples of those unusual inputs.
- **Consistency across a wide range of inputs is critical:** For production pipelines processing diverse inputs, more examples covering the range of input types improves reliability.

### Use Fewer Examples When:

- **The pattern is simple:** A single example may suffice for straightforward transformations.
- **Token cost matters:** Each example adds tokens that count against your budget. For high-volume tasks, minimize examples to control cost.
- **The model already handles the task well zero-shot:** If the model's zero-shot output is already good, examples may add little. Run a quick comparison — if zero-shot works, skip the examples.

## Choosing Good Examples

The examples you choose matter more than the number. Bad examples teach the wrong pattern. Here's how to choose examples that improve output.

### Make Examples Representative

Your examples should reflect the kinds of inputs the model will actually see. If you're classifying customer reviews, your examples should be realistic reviews, not artificial ones. If the real inputs are short and informal, don't use long, formal examples. Match the distribution of real inputs as closely as possible.

### Cover the Output Space

If your task has three possible output classes, include at least one example of each class. If you're doing sentiment classification with Positive, Negative, and Neutral labels, make sure your examples include all three. If you only show Positive and Negative examples, the model may never produce Neutral, even when the input warrants it. Coverage of the output space ensures the model knows all valid responses.

### Include Edge Cases

The cases where models fail are usually edge cases, not typical inputs. If your task has known tricky cases — ambiguous inputs, near-boundary examples, multi-label situations — include examples of how you want them handled. An edge-case example prevents the model from guessing wrong on similar inputs later.

### Avoid Misleading Examples

A common mistake is choosing examples that teach an unintended pattern. If all your Positive examples mention the word "love," the model might learn that "love" is the signal for Positive and misclassify a review that says "I love the concept but the execution is terrible" as Positive. Choose examples that demonstrate the actual reasoning you want, not superficial patterns. Diversify the vocabulary and structure of your examples so the model learns the underlying pattern rather than surface features.

### Keep Examples Concise

Long, verbose examples cost more tokens and can bury the pattern. Keep examples as concise as possible while still being realistic. If a full review is 200 words, you can often use a representative 30-word excerpt. The model learns the pattern from the input-output relationship, not from the example length.

## Few-Shot Prompting Templates

Here are practical, adaptable templates for common few-shot use cases.

### Template 1: Text Classification

```
Classify the customer feedback into one of these categories: Bug, Feature
Request, Praise, Question, or Complaint.

Feedback: "The app crashes when I try to upload a photo."
Category: Bug

Feedback: "It would be great if I could export my data as CSV."
Category: Feature Request

Feedback: "This is the best tool I've used all year, thank you!"
Category: Praise

Feedback: "How do I change my notification settings?"
Category: Question

Feedback: "The new update ruined the interface, it's unusable now."
Category: Complaint

Feedback: "[your input here]"
Category:
```

### Template 2: Format Transformation

```
Convert the meeting note into the structured format shown below.

Note: "Decided to launch the feature next Tuesday. Sarah will handle QA.
Concerns about server capacity — John to investigate."

Output:
Decisions:
- Launch the feature next Tuesday
Action Items:
- Sarah: Handle QA
- John: Investigate server capacity concerns
Open Questions: None

Note: "Talked about redesigning the dashboard. No final decision. Need user
research first. Mary to recruit participants by Friday."

Output:
Decisions: None (discussion only)
Action Items:
- Mary: Recruit user research participants by Friday
Open Questions:
- Should we redesign the dashboard?

Note: "[your input here]"
Output:
```

### Template 3: Style Transfer

```
Rewrite each sentence in a formal, professional tone.

Informal: "Hey, can you send me that report ASAP?"
Formal: "Could you please send me the report at your earliest convenience?"

Informal: "This is totally broken, fix it now."
Formal: "There appears to be an issue that requires immediate attention."

Informal: "Thanks a bunch for the help!"
Formal: "Thank you very much for your assistance."

Informal: "[your input here]"
Formal:
```

### Template 4: Information Extraction

```
Extract the key entities from each product description.

Description: "The Acme X200 wireless headphones offer 30 hours of battery
life and active noise cancellation. Available in black and silver."
Entities:
- Product: Acme X200 wireless headphones
- Battery life: 30 hours
- Features: active noise cancellation
- Colors: black, silver

Description: "The Zenith Pro laptop features a 14-inch OLED display, 16GB
RAM, and weighs 1.2kg. Comes in space gray."
Entities:
- Product: Zenith Pro laptop
- Display: 14-inch OLED
- RAM: 16GB
- Weight: 1.2kg
- Colors: space gray

Description: "[your input here]"
Entities:
```

### Template 5: Tone-Adjusted Response Generation

```
Generate a customer service response matching the tone shown in the examples.

Customer: "I've been waiting two weeks for my order, this is unacceptable."
Response (empathetic, professional): "I sincerely apologize for the delay.
That's not the experience we want you to have. Let me look into your order
right away and get this resolved for you."

Customer: "Your product stopped working after one day."
Response (empathetic, professional): "I'm so sorry to hear that — that's
certainly not our standard. Let's get this sorted out. Could you share your
order number so I can arrange a replacement or refund?"

Customer: "[your input here]"
Response (empathetic, professional):
```

## Common Mistakes in Few-Shot Prompting

Even experienced prompt engineers make these mistakes. Recognizing them saves you hours of debugging.

### Mistake 1: Inconsistent Formatting Between Examples and Input

Your examples use one format; your actual input uses another. The model gets confused because the pattern doesn't extend cleanly. Always make the real input look like an unfinished example — same labels, same delimiters, same structure.

### Mistake 2: Examples That Don't Cover the Output Space

You show examples of two output classes but the task has five. The model never produces the classes it hasn't seen demonstrated. Audit your examples: does every valid output have at least one example?

### Mistake 3: Too Many Examples That Overwhelm

You include ten examples in an attempt to be thorough. The prompt becomes long, expensive, and the examples start to introduce conflicting patterns. For most tasks, two to five well-chosen examples outperform ten. Quality over quantity.

### Mistake 4: Examples With Surface Patterns the Model Copies

Your examples all start the same way or use the same vocabulary, and the model learns to mimic those surface features instead of the underlying pattern. Diversify your examples so the model can't shortcut to a superficial rule. Include examples that vary in length, vocabulary, and structure while preserving the input-output relationship you want.

### Mistake 5: Order Effects

Models can be sensitive to the order of examples. If you always put Positive examples first, the model may be biased toward Positive. Shuffle the order of your examples, especially for classification tasks, so the model can't learn an order-based shortcut. This is a subtle but real effect, particularly with smaller or less capable models.

### Mistake 6: Not Testing on New Inputs

Your examples work beautifully on the inputs you used to design them, but the model's behavior on genuinely new inputs is untested. Always hold out a set of test inputs that weren't used to design the prompt. Run them through and check accuracy. This catches over-fitting to your examples.

## Few-Shot vs. Fine-Tuning: When to Use Which

Few-shot prompting and fine-tuning both improve model behavior, but they serve different purposes:

### Use Few-Shot Prompting When:

- You have a small number of examples (two to twenty)
- You need quick iteration — you can change examples and test immediately
- The task or pattern changes frequently
- You don't have the resources or expertise for fine-tuning
- You're prototyping or validating an approach

### Use Fine-Tuning When:

- You have hundreds or thousands of high-quality examples
- The task is stable and recurring at high volume
- Few-shot prompting can't achieve the accuracy you need
- Token cost of including examples at inference time is prohibitive
- You need consistent behavior that doesn't vary with prompt formulation

For most practical prompt engineering work, few-shot prompting is the starting point. Fine-tuning becomes worthwhile only when you have a proven task that few-shot can't fully crack and enough data to justify the investment.

## Advanced Few-Shot Techniques

Once you're comfortable with basic few-shot prompting, these techniques help with harder tasks.

### Chain-of-Thought Few-Shot

For reasoning tasks (math, logic, multi-step inference), include the reasoning steps in your examples, not just the final answer. The model learns to produce the reasoning, which improves accuracy on complex questions.

```
Q: A store sells pencils at 3 for $1. How much do 18 pencils cost?
A: 3 pencils cost $1, so 1 pencil costs $1/3 = $0.33. 18 pencils cost
18 × $0.33 = $6.00. The answer is $6.00.

Q: If 5 machines make 5 widgets in 5 minutes, how long does it take 100
machines to make 100 widgets?
A: 5 machines make 5 widgets in 5 minutes, so each machine makes 1 widget
in 5 minutes. 100 machines each making 1 widget in 5 minutes takes 5
minutes. The answer is 5 minutes.

Q: [your question here]
A:
```

The model sees not just the question-answer pair but the reasoning process, and it reproduces that reasoning before producing the answer. This is especially powerful for tasks where the reasoning matters as much as the answer.

### Dynamic Example Selection

Instead of fixed examples, select examples relevant to each new input. For a classification task, you might retrieve the most similar past examples and include those. This requires a retrieval system but can significantly improve accuracy on diverse inputs, because the examples are always relevant to the current case.

### Negative Examples

Include examples of what not to do. Showing the model an input and a wrong output, labeled as incorrect, helps it learn boundaries. This is especially useful for classification tasks where classes are close together.

```
Review: "It's okay I guess."
Classification: Neutral
Note: Not Positive — "okay" doesn't indicate enthusiasm.

Review: "It's fine, does the job."
Classification: Neutral
Note: Not Negative — "fine" and "does the job" indicate mild acceptance.
```

The negative examples clarify where the boundaries are, reducing misclassification of borderline inputs.

## Measuring Few-Shot Prompt Quality

To know if your few-shot prompt is working, evaluate it systematically:

1. **Create a test set** of 20-50 inputs that represent the range of real inputs, with the correct outputs labeled.
2. **Run the prompt** on all test inputs.
3. **Measure accuracy** — what percentage of outputs match the correct labels?
4. **Analyze failures** — group the wrong outputs by error type. Are they format errors, classification errors, or something else?
5. **Refine** — add or adjust examples to address the most common failure types.
6. **Re-test** — run the revised prompt on the same test set and compare.

This turns prompt tuning from guesswork into a measurable improvement loop. Keep your test set stable so you can compare versions fairly.

## Conclusion

Few-shot prompting is one of the most reliable techniques in prompt engineering. By showing the model the exact pattern you want through a small number of well-chosen examples, you get more accurate, more consistent output than instructions alone can achieve. The key principles are simple: choose representative examples, cover the output space, keep formatting consistent, and test on inputs the model hasn't seen. Master this technique and you'll solve a large fraction of real-world prompting challenges — from classification to formatting to style control — with a few lines of well-placed examples.

Ready to put few-shot prompting into practice with a tool that helps you manage, test, and version your prompts? [Sign up at promptwright.net/signup](https://promptwright.net/signup) and start building prompts you can rely on.

"Few-Shot Prompting: How to Teach AI With Examples for Better Results"

Enjoyed This Article?

Ready to build better prompts?

More Articles

"Prompt Chaining: How to Break Complex Tasks Into Step-by-Step AI Prompts"

"AI Prompt Tools Compared: Which One Should You Use in 2026?"

"AI Prompt Variables Explained: Build Reusable Prompt Templates"