AI Concepts

AI Agents and Tool Use: A Practical Guide

What AI agents actually are, how function calling works, planning loops, multi-agent architectures, and how to evaluate and safely deploy agent systems.

AI Agents and Tool Use: A Practical Guide

AI agents are the most talked-about — and most misunderstood — development in the AI space. This guide cuts through the hype with a practical explanation of what agents are, how function calling works, how planning loops operate, and how to think about safety and reliability when deploying them.

What Is an AI Agent?

An AI agent is a system where a language model drives a loop:

  1. Observe — receive input (user request, tool output, environment state)
  2. Reason — decide what action to take
  3. Act — call a tool, run code, send a message, query a database
  4. Repeat — until the task is complete or a stopping condition is reached

The model is the reasoning core. Tools are the interface to the external world. The loop is what makes it an agent rather than a single-turn interaction.

A chatbot answers questions. An agent takes actions.

Function Calling (Tool Use)

Function calling is the mechanism that lets models interact with external systems. The model doesn't execute functions directly — it generates a structured request that your application intercepts and executes.

How It Works

1. You define tools (functions with names, descriptions, parameters)
2. User sends a message
3. Model responds with a tool_call instead of (or before) a text response
4. Your application executes the function
5. You send the result back to the model
6. Model continues reasoning with the new information

Code Example (Claude API)

import anthropic
import json

client = anthropic.Anthropic()

tools = [
    {
        "name": "execute_sql",
        "description": "Execute a read-only SQL query against the analytics database. Use for questions about user data, metrics, or business statistics.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "SQL SELECT query to execute. Must be read-only."
                }
            },
            "required": ["query"]
        }
    }
]

messages = [{"role": "user", "content": "How many users signed up in the last 30 days?"}]

while True:
    response = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=4096,
        tools=tools,
        messages=messages
    )
    
    if response.stop_reason == "end_turn":
        print(response.content[0].text)
        break
    
    # Handle tool calls
    for block in response.content:
        if block.type == "tool_use":
            # Execute the function
            result = execute_sql(block.input["query"])  # your implementation
            
            # Add to message history
            messages.append({"role": "assistant", "content": response.content})
            messages.append({
                "role": "user",
                "content": [{
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": json.dumps(result)
                }]
            })

Planning Loops

For complex tasks, agents use planning — either explicit (the model writes a plan) or implicit (the model reasons step by step through tool calls).

ReAct Pattern

ReAct (Reasoning + Acting) is the most common agent pattern. The model alternates between reasoning about the situation and taking an action:

Thought: The user wants to know the top 5 customers by revenue. I should
query the orders table grouped by customer.

Action: execute_sql
Input: SELECT customer_id, SUM(total) as revenue FROM orders
       GROUP BY customer_id ORDER BY revenue DESC LIMIT 5

Observation: [(cust_42, 84500), (cust_17, 71200), (cust_88, 63100), ...]

Thought: I have the IDs but the user wants names. I should look up customer names.

Action: execute_sql
Input: SELECT id, name, email FROM customers WHERE id IN (42, 17, 88, ...)

Observation: [(42, "Acme Corp", ...), (17, "DataFlow Inc", ...)]

Thought: I now have all the information. I can present the final answer.

Answer: Your top 5 customers by revenue in the last period are:
1. Acme Corp — $84,500
2. DataFlow Inc — $71,200
...

Plan-and-Execute

For longer, multi-step tasks, separate planning from execution:

Step 1 (Planning): "Create a plan with specific steps to complete this task: [task]"
→ Model returns numbered plan

Step 2 (Execution): Execute each step, potentially with different tools or models

Step 3 (Verification): "Review the output of each step. Were all steps completed correctly?"

Multi-Agent Systems

Complex workflows can use multiple specialized agents working together:

Orchestrator Agent
├── Research Agent (web search + summarization)
├── Code Agent (code generation + execution)
└── Review Agent (quality checking + critique)

Each sub-agent has a specialized system prompt and tool set. The orchestrator delegates tasks and integrates results.

When to use multi-agent:

  • Tasks require different types of expertise or tools
  • Parallel work is possible (multiple agents working simultaneously)
  • Built-in verification through a separate review agent is valuable
  • The task is too complex for a single context window

Safety and Reliability

Principle of Least Privilege

Give agents only the tools they need for their specific task. An agent that needs to read data should not have write access. An agent that queries one database should not have access to others.

Human-in-the-Loop for Consequential Actions

For any action that is irreversible or high-stakes, require human confirmation:

"I'm going to delete 847 records matching this criteria. Here's a preview
of what will be deleted:
[preview]

Type CONFIRM to proceed or CANCEL to abort."

Rate Limiting and Cost Guards

Agents in loops can make many more API calls than you expect. Implement:

  • Maximum iterations per task
  • Maximum cost per task
  • Timeout limits
  • Alerts when unusual call volumes are detected

Sandboxed Code Execution

If your agent executes code (very common), run it in a sandbox:

  • Docker container with no network access
  • No access to production credentials
  • Read-only filesystem where possible
  • Time and memory limits

Comprehensive Logging

Log every tool call and result. Agent failures are hard to debug without a full trace of what the model decided to do and why.

def logged_tool_call(tool_name, input_params, execute_fn):
    logger.info(f"Tool call: {tool_name}", extra={"input": input_params})
    try:
        result = execute_fn(input_params)
        logger.info(f"Tool result: {tool_name}", extra={"output": result})
        return result
    except Exception as e:
        logger.error(f"Tool error: {tool_name}", extra={"error": str(e)})
        raise

Evaluating Agent Systems

Agents are hard to evaluate because success depends on the full trajectory, not just the final output:

  • Task success rate — did the agent complete the task correctly?
  • Efficiency — how many tool calls did it take? (fewer is better)
  • Error recovery — when a tool fails, does the agent recover gracefully?
  • Hallucination rate — does the agent invent tool calls or results?
  • Safety compliance — does the agent respect its constraints?

Build an evaluation suite with representative tasks and run it on every significant prompt or tool change.

Key Takeaways

  • Agents combine a reasoning model with a tool-calling loop and a stopping condition
  • Function calling is the mechanism — your application executes the functions, not the model
  • ReAct is the foundational pattern: reason, act, observe, repeat
  • Give agents only the tools they need — principle of least privilege
  • Human confirmation is essential for irreversible or high-stakes actions
  • Log every tool call — agent debugging requires full execution traces