Prompt Engineering

Few-Shot Prompting: Teaching by Example

Master few-shot prompting: how to select high-quality examples, format them correctly, and know when few-shot beats zero-shot. Includes patterns for classification, extraction, and generation tasks.

Few-Shot Prompting: Teaching by Example

Few-shot prompting is the practice of including worked examples directly in your prompt to demonstrate the exact input-output pattern you want. It's one of the oldest and most reliable prompting techniques — and when done well, it can transform a mediocre result into a precise, consistently formatted output that requires no post-processing.

This guide covers how to construct effective few-shot prompts, how to select and sequence examples, and when few-shot outperforms zero-shot (and vice versa).

What Is Few-Shot Prompting?

In few-shot prompting, you provide the model with n input-output pairs before the actual query. These examples act as an implicit specification: they show rather than tell.

Zero-shot:

Extract the company name and dollar amount from this press release:
[press release text]

Few-shot (3-shot):

Extract the company name and dollar amount from each press release.

Press release: "Acme Corp announced today it has raised $42M in Series B funding..."
Output: {"company": "Acme Corp", "amount": "$42M", "round": "Series B"}

Press release: "DataFlow, a Berlin-based startup, closed a €15M seed round..."
Output: {"company": "DataFlow", "amount": "€15M", "round": "Seed"}

Press release: "TechWave said it completed an $8.5 million bridge financing..."
Output: {"company": "TechWave", "amount": "$8.5M", "round": "Bridge"}

Press release: [YOUR TEXT HERE]
Output:

The few-shot version communicates the output schema, handles currency variants, normalizes amounts, and even infers round type — without writing a single explicit instruction.

When Few-Shot Beats Zero-Shot

Few-shot outperforms zero-shot when:

  • Output format is non-standard — JSON with specific keys, CSV, custom markup
  • The task has implicit conventions — domain-specific classification, style matching
  • Edge cases matter — demonstrate how to handle ambiguous or unusual inputs
  • Consistency is critical — you need the same format across hundreds of calls
  • The task is novel to the model — rare domains or invented schemas

Zero-shot is preferable when:

  • The task is simple and well-understood (sentiment: positive/negative/neutral)
  • Latency and cost are constraints (each example adds tokens)
  • You want maximum flexibility in the response

Anatomy of a Good Example

Each few-shot example should have three properties:

1. Representative Input

Choose inputs that reflect the real distribution of queries you'll receive — not just the easy, clean cases. If 20% of real inputs will be messy or edge-case-y, at least one of your examples should be too.

2. Correct, Ideal Output

The output must be exactly what you want. Sloppy examples teach sloppy behavior. If your output is JSON, it must be valid JSON with consistent key names. If it's prose, it must match the tone and length you want.

3. A Consistent Format Separator

Use consistent delimiters between examples and between input/output. Common patterns:

# Pattern 1: Label-based
Input: ...
Output: ...

# Pattern 2: XML-style
<example>
<input>...</input>
<output>...</output>
</example>

# Pattern 3: Q/A style
Q: ...
A: ...

Pick one and use it consistently across all examples and the final query.

How Many Examples?

More is not always better. Research and practice suggest:

  • 1-shot — enough for simple format demonstration
  • 3-shot — the sweet spot for most classification and extraction tasks
  • 5-10 shot — needed for tasks with meaningful variation or complex reasoning
  • 10+ shot — diminishing returns; consider fine-tuning instead

For tasks with multiple classes or output types, aim for at least one example per class.

Example Selection Strategy

This is the highest-leverage decision in few-shot prompting.

Diversity Over Repetition

Don't pick three examples that are nearly identical. Cover different facets of the task:

# BAD: Three nearly identical positive reviews
Input: "Great product, loved it!"
Output: positive

Input: "Amazing quality, highly recommend"
Output: positive

Input: "Best purchase I've made this year"
Output: positive

# GOOD: Covers different cases
Input: "Great product, loved it!"
Output: positive

Input: "Arrived damaged and customer service was unhelpful"
Output: negative

Input: "It's fine, does what it says, nothing special"
Output: neutral

Recency Bias

Models weight recent context more heavily. Put your most important or representative example last — immediately before the actual query.

Show Don't Tell on Edge Cases

If there's a tricky case users will encounter, demonstrate it explicitly rather than describing it in instructions:

# Showing how to handle missing data
Input: "TechCorp is exploring fundraising options but has not announced a final amount"
Output: {"company": "TechCorp", "amount": null, "round": "unknown"}

Few-Shot for Different Task Types

Classification

Classify the support ticket priority.

Ticket: "I can't log in to my account at all"
Priority: High

Ticket: "The dark mode toggle is slightly misaligned"
Priority: Low

Ticket: "Payment processing is failing for all enterprise customers"
Priority: Critical

Ticket: [NEW TICKET]
Priority:

Structured Extraction

Extract action items as a JSON array.

Meeting notes: "John will update the API docs by Friday. Sarah is owning the
design review and needs it done before the sprint ends. No deadline on the
backlog cleanup — just when we get to it."
Action items: [
  {"owner": "John", "task": "Update API docs", "deadline": "Friday"},
  {"owner": "Sarah", "task": "Design review", "deadline": "End of sprint"},
  {"owner": null, "task": "Backlog cleanup", "deadline": null}
]

Meeting notes: [NEW NOTES]
Action items:

Style Transfer

Rewrite the sentence in the style of Ernest Hemingway.

Original: "The meteorological conditions were characterized by significant
precipitation and a considerable reduction in ambient temperature."
Hemingway: "It rained hard and turned cold."

Original: "She experienced a profound emotional response upon encountering
her former romantic partner in an unexpected context."
Hemingway: "She saw him and felt something she didn't name."

Original: [YOUR SENTENCE]
Hemingway:

Combining Few-Shot with System Prompts

For production use, combine a system prompt (for role and constraints) with few-shot examples (for format and edge cases):

System: You are a data extraction assistant. Always return valid JSON.
Never include null fields — omit them entirely. Be conservative:
if you're not certain about a value, omit it.

User:
Extract entities from: "Microsoft acquired Activision for $68.7B in 2023."
Output: {"acquirer": "Microsoft", "target": "Activision", "value": "$68.7B", "year": 2023}

Extract entities from: "The merger was called off before completion."
Output: {}

Extract entities from: [YOUR TEXT]
Output:

Debugging Few-Shot Prompts

When few-shot results are inconsistent:

  1. Check example quality — is the output in each example exactly correct?
  2. Check format consistency — are all delimiters identical?
  3. Add more diversity — are you missing an example that covers the failing case?
  4. Check for contradictions — do your examples imply conflicting rules?
  5. Verify ordering — move the most representative example to last position

Key Takeaways

  • Few-shot works by demonstrating patterns, not describing them
  • 3 diverse, high-quality examples beat 10 repetitive ones
  • Consistent formatting across all examples is non-negotiable
  • Show edge cases explicitly — don't describe them
  • Combine with system prompts for production reliability
  • When few-shot fails, debug the examples before anything else