AI Agents and Tool Use: A Practical Guide

AI agents are the most talked-about — and most misunderstood — development in the AI space. This guide cuts through the hype with a practical explanation of what agents are, how function calling works, how planning loops operate, and how to think about safety and reliability when deploying them.

What Is an AI Agent?

An AI agent is a system where a language model drives a loop:

Observe — receive input (user request, tool output, environment state)
Reason — decide what action to take
Act — call a tool, run code, send a message, query a database
Repeat — until the task is complete or a stopping condition is reached

The model is the reasoning core. Tools are the interface to the external world. The loop is what makes it an agent rather than a single-turn interaction.

A chatbot answers questions. An agent takes actions.

Function Calling (Tool Use)

Function calling is the mechanism that lets models interact with external systems. The model doesn't execute functions directly — it generates a structured request that your application intercepts and executes.

How It Works

1. You define tools (functions with names, descriptions, parameters)
2. User sends a message
3. Model responds with a tool_call instead of (or before) a text response
4. Your application executes the function
5. You send the result back to the model
6. Model continues reasoning with the new information

Code Example (Claude API)

import anthropic
import json

client = anthropic.Anthropic()

tools = [
    {
        "name": "execute_sql",
        "description": "Execute a read-only SQL query against the analytics database. Use for questions about user data, metrics, or business statistics.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "SQL SELECT query to execute. Must be read-only."
                }
            },
            "required": ["query"]
        }
    }
]

messages = [{"role": "user", "content": "How many users signed up in the last 30 days?"}]

while True:
    response = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=4096,
        tools=tools,
        messages=messages
    )
    
    if response.stop_reason == "end_turn":
        print(response.content[0].text)
        break
    
    # Handle tool calls
    for block in response.content:
        if block.type == "tool_use":
            # Execute the function
            result = execute_sql(block.input["query"])  # your implementation
            
            # Add to message history
            messages.append({"role": "assistant", "content": response.content})
            messages.append({
                "role": "user",
                "content": [{
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": json.dumps(result)
                }]
            })

Planning Loops

For complex tasks, agents use planning — either explicit (the model writes a plan) or implicit (the model reasons step by step through tool calls).

ReAct Pattern

ReAct (Reasoning + Acting) is the most common agent pattern. The model alternates between reasoning about the situation and taking an action:

Thought: The user wants to know the top 5 customers by revenue. I should
query the orders table grouped by customer.

Action: execute_sql
Input: SELECT customer_id, SUM(total) as revenue FROM orders
       GROUP BY customer_id ORDER BY revenue DESC LIMIT 5

Observation: [(cust_42, 84500), (cust_17, 71200), (cust_88, 63100), ...]

Thought: I have the IDs but the user wants names. I should look up customer names.

Action: execute_sql
Input: SELECT id, name, email FROM customers WHERE id IN (42, 17, 88, ...)

Observation: [(42, "Acme Corp", ...), (17, "DataFlow Inc", ...)]

Thought: I now have all the information. I can present the final answer.

Answer: Your top 5 customers by revenue in the last period are:
1. Acme Corp — $84,500
2. DataFlow Inc — $71,200
...

Plan-and-Execute

For longer, multi-step tasks, separate planning from execution:

Step 1 (Planning): "Create a plan with specific steps to complete this task: [task]"
→ Model returns numbered plan

Step 2 (Execution): Execute each step, potentially with different tools or models

Step 3 (Verification): "Review the output of each step. Were all steps completed correctly?"

Multi-Agent Systems

Complex workflows can use multiple specialized agents working together:

Orchestrator Agent
├── Research Agent (web search + summarization)
├── Code Agent (code generation + execution)
└── Review Agent (quality checking + critique)

Each sub-agent has a specialized system prompt and tool set. The orchestrator delegates tasks and integrates results.

When to use multi-agent:

Tasks require different types of expertise or tools
Parallel work is possible (multiple agents working simultaneously)
Built-in verification through a separate review agent is valuable
The task is too complex for a single context window

Safety and Reliability

Principle of Least Privilege

Give agents only the tools they need for their specific task. An agent that needs to read data should not have write access. An agent that queries one database should not have access to others.

Human-in-the-Loop for Consequential Actions

For any action that is irreversible or high-stakes, require human confirmation:

"I'm going to delete 847 records matching this criteria. Here's a preview
of what will be deleted:
[preview]

Type CONFIRM to proceed or CANCEL to abort."

Rate Limiting and Cost Guards

Agents in loops can make many more API calls than you expect. Implement:

Maximum iterations per task
Maximum cost per task
Timeout limits
Alerts when unusual call volumes are detected

Sandboxed Code Execution

If your agent executes code (very common), run it in a sandbox:

Docker container with no network access
No access to production credentials
Read-only filesystem where possible
Time and memory limits

Comprehensive Logging

Log every tool call and result. Agent failures are hard to debug without a full trace of what the model decided to do and why.

def logged_tool_call(tool_name, input_params, execute_fn):
    logger.info(f"Tool call: {tool_name}", extra={"input": input_params})
    try:
        result = execute_fn(input_params)
        logger.info(f"Tool result: {tool_name}", extra={"output": result})
        return result
    except Exception as e:
        logger.error(f"Tool error: {tool_name}", extra={"error": str(e)})
        raise

Evaluating Agent Systems

Agents are hard to evaluate because success depends on the full trajectory, not just the final output:

Task success rate — did the agent complete the task correctly?
Efficiency — how many tool calls did it take? (fewer is better)
Error recovery — when a tool fails, does the agent recover gracefully?
Hallucination rate — does the agent invent tool calls or results?
Safety compliance — does the agent respect its constraints?

Build an evaluation suite with representative tasks and run it on every significant prompt or tool change.

Key Takeaways

Agents combine a reasoning model with a tool-calling loop and a stopping condition
Function calling is the mechanism — your application executes the functions, not the model
ReAct is the foundational pattern: reason, act, observe, repeat
Give agents only the tools they need — principle of least privilege
Human confirmation is essential for irreversible or high-stakes actions
Log every tool call — agent debugging requires full execution traces