Chain-of-Thought Prompting: Think Step by Step

Chain-of-thought (CoT) prompting is one of the most impactful techniques in the prompt engineer's toolkit. By asking a model to reason through a problem step by step before giving an answer, you can dramatically improve accuracy on tasks that require logic, math, multi-step reasoning, or careful analysis.

This guide explains the mechanics behind CoT, when to use it, and how to implement both zero-shot and few-shot variants effectively.

Why CoT Works

Large language models generate text token by token. Without explicit reasoning steps, the model jumps directly from input to output — which works fine for simple lookups but breaks down on anything requiring intermediate steps. CoT forces the model to externalize its reasoning, which serves two purposes:

It gives the model computational "scratch space" — intermediate steps allow the model to build toward a correct answer rather than guessing in one shot.
It makes reasoning visible — you can inspect where the logic went wrong if the answer is incorrect.

This is especially powerful for:

Math and arithmetic
Multi-step logical deduction
Code debugging
Medical/legal reasoning
Complex classification with nuanced criteria

Zero-Shot CoT

The simplest form of chain-of-thought prompting requires no examples. You simply append a reasoning instruction to your prompt.

The classic phrase: "Let's think step by step."

Example without CoT:

Prompt: A store has 48 apples. They sell 1/3 in the morning and 1/4 of the
remaining in the afternoon. How many apples are left?

Response: 24

(This is wrong.)

Example with zero-shot CoT:

Prompt: A store has 48 apples. They sell 1/3 in the morning and 1/4 of the
remaining in the afternoon. How many apples are left?

Let's think step by step.

Response:
- Start: 48 apples
- Morning sales: 48 × 1/3 = 16 apples sold → 32 remaining
- Afternoon sales: 32 × 1/4 = 8 apples sold → 24 remaining
- Answer: 24 apples

(Now correct — and verifiable.)

Variants of the Zero-Shot Trigger Phrase

The exact phrasing matters less than you might think, but these formulations consistently perform well:

"Think through this step by step before answering."
"Reason carefully before giving your final answer."
"Work through this problem systematically."
"Before answering, identify the key steps involved."

For structured output, you can be more prescriptive:

Reason through this step by step. Format your response as:
Thinking: [your reasoning]
Answer: [final answer only]

Few-Shot CoT

Few-shot CoT provides worked examples that demonstrate the reasoning pattern you want the model to follow. This is more powerful than zero-shot when:

The reasoning pattern is non-obvious
The domain is specialized
You need a specific reasoning structure
Zero-shot CoT produces inconsistent formats

Constructing Few-Shot CoT Examples

A few-shot CoT example has three parts:

The question — representative of the task
The reasoning chain — explicit intermediate steps
The final answer — clearly demarcated

Example: Legal clause classification

Classify whether each contract clause creates a liability for the vendor.

Clause: "Vendor shall not be liable for delays caused by circumstances
beyond its reasonable control, including acts of God, government actions,
or supplier failures."
Reasoning: This is a force majeure clause. It explicitly limits vendor
liability for specific external circumstances. The clause reduces, not
creates, liability.
Classification: No liability created

Clause: "Vendor warrants that deliverables will be free from defects
for 90 days following delivery and will remedy any defects at no cost."
Reasoning: This clause makes an affirmative warranty and commits the
vendor to remediation. A defective delivery triggers an obligation.
Vendor faces potential liability if the warranty is breached.
Classification: Liability created

Clause: "Either party may terminate this agreement with 30 days written notice."
Reasoning: [NOW CLASSIFY THIS]

The model follows the established pattern to reason through the new example.

Choosing Examples for Few-Shot CoT

Example selection is the highest-leverage variable in few-shot prompting:

Diversity — cover different sub-types of the task, not just the most common case
Difficulty — include at least one example where the naive answer would be wrong
Clarity — the reasoning chain should be unambiguous; don't model sloppy thinking
Length calibration — examples set the implicit length norm; don't use 500-word reasoning chains if you want 50-word answers
Order — put harder, more representative examples closer to the end (recency bias)

When CoT Hurts Performance

CoT is not always beneficial. It can degrade performance when:

The task is simple — asking a model to "think step by step" about a factual lookup adds noise without benefit
Speed matters more than accuracy — CoT increases token usage and latency
The task is purely creative — reasoning chains can constrain creative outputs
The model is small — CoT provides diminishing returns on models below a certain capability threshold

A useful rule of thumb: if a competent human could answer in under 5 seconds without writing anything down, CoT probably won't help.

CoT in Multi-Turn Conversations

In multi-turn settings, you can build reasoning chains across messages:

User: I need to decide whether to migrate our database to PostgreSQL.
Assistant: Let me work through the key decision factors with you.
First, what database are you currently using and what's the primary pain point?

User: We're on MySQL 5.7, pain point is JSON query performance.

Assistant: That's a common motivation. Let me reason through this:
1. PostgreSQL's JSONB type typically outperforms MySQL's JSON on complex queries...
2. Migration complexity depends on schema size and stored procedure usage...
3. The risk profile is...

This "collaborative CoT" style keeps the user engaged while externalizing reasoning.

Practical Templates

For math/logic problems:

Solve this step by step. Show each calculation. Box your final answer.

For analysis tasks:

Before giving your recommendation, list:
1. Key facts from the input
2. Assumptions you're making
3. Trade-offs involved
Then state your recommendation and rationale.

For debugging:

Diagnose this issue step by step:
1. What is the expected behavior?
2. What is the actual behavior?
3. What are the possible causes?
4. Which cause best fits the evidence?
5. What is the recommended fix?

Key Takeaways

CoT works by forcing intermediate reasoning steps rather than direct input-to-output jumps
Zero-shot CoT (append "think step by step") is often enough for math and logic
Few-shot CoT with carefully chosen examples handles specialized or structured reasoning
CoT adds tokens and latency — use it when accuracy matters more than speed
Always separate the reasoning chain from the final answer for cleaner downstream processing