03 - Custom Tools
What Is Tool Calling?
Large language models are remarkable at generating text, but they are fundamentally limited: they can only produce words. They cannot look up today's weather, query a database, execute code, or send an email. They are brains without hands.
Tool calling (sometimes called "function calling") bridges this gap. It's a protocol where the LLM can say: "I don't know the weather in Tokyo, but I know there's a get_weather tool that does. Let me call it with city: 'Tokyo', wait for the result, and then incorporate that result into my response."
Here's the mental model: you give the LLM a menu of tools with descriptions of what each tool does and what parameters it accepts. The LLM reads this menu and, during its reasoning, decides when to "order" from the menu. When it does, your code executes the tool and feeds the result back to the model, which then continues generating its response.
This tool-calling loop is what transforms a language model from a text generator into an agent that can take actions in the real world. In this chapter, you'll define your own tools and wire them into a session.
What You'll Learn
- How to define a
ToolDefinitionwith name, label, description, parameters, and execute - Using
@sinclair/typebox(NOT Zod) for parameter schemas - How
execute()returns{ content: [...], details: {} } - Tool execution events:
tool_execution_start,tool_execution_end - Passing custom tools via the
customToolsoption - How to write tool descriptions that LLMs understand well
Why TypeBox Instead of Zod?
If you've worked with TypeScript validation libraries, you've likely used Zod. It's the most popular choice in the ecosystem. So why does pi-coding-agent use TypeBox instead?
The reason is JSON Schema compatibility. When your tool definitions are sent to the LLM, the parameter schemas must be serialized as JSON Schema -- the format that OpenAI, Anthropic, and Google all use in their API specifications. TypeBox schemas are JSON Schema by construction. Every Type.String(), Type.Object(), and Type.Number() call produces a plain JSON Schema object directly, with no conversion step needed.
Zod, by contrast, uses its own internal representation. Converting Zod schemas to JSON Schema requires a separate library (zod-to-json-schema), introduces edge cases, and adds a dependency. TypeBox eliminates this entire category of problems.
If you're coming from Zod, the mapping is straightforward:
z.string()becomesType.String()z.number()becomesType.Number()z.object({...})becomesType.Object({...})z.enum([...])becomesType.Union([Type.Literal(...), ...])- Descriptions:
z.string().describe('...')becomesType.String({ description: '...' }):::
Tool Anatomy
Every tool in pi-coding-agent is defined as a ToolDefinition object with five fields:
Let's examine each field:
name-- The identifier the LLM uses when calling this tool. Usesnake_case(e.g.,get_weather,run_query). This must be unique across all tools in a session.label-- A human-readable name shown in UIs and logs. Not sent to the LLM.description-- This is critically important. The LLM reads this description to decide when and how to use the tool. A vague description leads to the tool being used incorrectly or not at all. (More on this below.)parameters-- A TypeBox schema defining the inputs the tool accepts. The LLM generates arguments matching this schema. Each parameter should include adescriptionto help the LLM fill it in correctly.execute-- The async function that runs when the LLM calls the tool. It receives the tool call ID, parsed parameters, an abort signal, and an update callback. It must return an object withcontent(array of result items) anddetails(metadata).
See the ToolDefinition API reference for the full interface.
The Tool Execution Lifecycle
Understanding the lifecycle helps you debug tool-related issues and build robust tools:
The key insight is that tool execution happens inside the agent loop. After your tool returns, the LLM sees the result and may decide to call another tool, ask a follow-up question, or produce its final response. A single session.prompt() call can trigger multiple tool executions in sequence.
Example: Weather Tool
This tool simulates a weather API. In production, you'd replace the hardcoded data with a real HTTP call:
Notice that the tool returns structured data as a JSON string, not a natural language sentence. This is intentional -- the LLM is better at interpreting structured data and weaving it into a natural response than receiving a pre-formatted sentence. Let the model handle the presentation.
Example: Calculator Tool
This tool demonstrates error handling within execute(). When the expression is invalid, instead of throwing an exception, the tool returns an error message as its content. This lets the LLM see the error, understand what went wrong, and potentially retry with a corrected expression or explain the issue to the user.
:::warning
The Function() constructor used in this calculator tool is a simplified demo. In production, never evaluate arbitrary user-provided expressions this way -- it's equivalent to eval() and is a serious security risk. Use a proper math expression parser like mathjs instead.
Wiring Tools Into a Session
Pass tools via the customTools array:
The tools array is for pi-coding-agent's built-in coding tools (file read, file write, bash execution, etc.). The customTools array is for your tools. Keep them separate -- this makes it easy to enable or disable built-in coding capabilities independently of your custom tools.
Observing Tool Events
Just like text streaming, tool execution is observable through the event system:
The tool_execution_start event fires before your execute() function runs, and tool_execution_end fires after it returns. This lets you build UIs that show "Loading weather data..." while the tool is running, or log tool calls for debugging.
Designing Effective Tools
The quality of your tool descriptions and parameter schemas directly affects how well the LLM uses them. Here are guidelines learned from production experience:
Write descriptions from the LLM's perspective
Your description is the LLM's only guide for deciding when to use a tool. Write it as if you're explaining to a human assistant:
Include examples in parameter descriptions
LLMs understand structured data better with examples:
Keep tools focused
Each tool should do one thing well. Instead of a single do_everything tool with many optional parameters, create several focused tools. The LLM is better at choosing among specific tools than navigating complex parameter combinations.
Return structured data, not natural language
Return JSON objects or structured text, not pre-formatted sentences. The LLM is excellent at presenting data in a human-friendly way, and it can adapt the presentation to the conversation context:
Handle errors gracefully
Never throw exceptions from execute(). Always return an error as content so the LLM can reason about it:
When a tool returns an error, the LLM often handles it gracefully -- it might apologize, explain what went wrong, or try a different approach. This is much better than crashing the entire agent with an unhandled exception.
Common Mistakes
Forgetting as const on the content type. TypeScript needs type: 'text' as const to narrow the string literal type. Without as const, TypeScript infers type: string, which doesn't satisfy the ToolResultContent union type.
Using Zod instead of TypeBox. The parameters field must be a TypeBox schema (Type.Object({...})), not a Zod schema. pi-coding-agent uses TypeBox for direct JSON Schema compatibility.
Tool names with hyphens or spaces. Use snake_case for tool names (e.g., get_weather, not get-weather or Get Weather). Some LLM providers reject non-standard naming.
Overly generic tool descriptions. If the LLM can't tell when to use your tool, it won't. Be specific about the trigger conditions in the description.
Run
Expected Output
Notice how the LLM correctly identifies that it needs two tools and calls them both before composing its final response. This is the agent loop in action -- the model reasons about what tools to call, executes them, and then synthesizes the results into a coherent answer.
Key Takeaways
- Tool calling lets LLMs interact with the outside world by requesting your code to execute functions on their behalf.
- Tools are defined with a
ToolDefinitionobject containing a name, description, TypeBox parameter schema, and anexecute()function. - TypeBox is used instead of Zod because it generates JSON Schema natively, which is the format LLM APIs expect.
- The quality of your tool descriptions is critical -- the LLM reads them to decide when and how to use each tool.
- Tool execution happens inside the agent loop: the LLM can call multiple tools in sequence and reason about their results before producing a final response.
- Always handle errors inside
execute()and return them as content rather than throwing exceptions.
Next
Chapter 04: Session Persistence -- save and resume conversations so the agent remembers what you discussed yesterday.