08 - Full CLI Agent
The grand finale -- a complete, production-quality CLI agent that combines every pattern from chapters 01 through 07 into a single cohesive application.
Why This Chapter Matters
Over the previous seven chapters, you have learned each building block of an AI agent in isolation: model creation, streaming, custom tools, session persistence, confirmation gating, system prompts with skills, and multi-session management. Each chapter focused on one concept with minimal code.
Real applications do not work that way. In production, all of these pieces must work together harmoniously: streaming must not interfere with confirmation prompts, session switching must clean up event listeners, abort signals must propagate through tool execution, and the REPL must remain responsive during long-running operations.
This chapter shows you how to compose those building blocks into a single, well-structured CLI agent. More importantly, it introduces two new production-critical patterns:
- DeltaBatcher -- a buffering layer that smooths out the character-by-character stuttering of raw LLM streaming into fluid, readable terminal output.
- Abort handling -- graceful cancellation of in-progress generation via Ctrl+C, so users are never trapped waiting for a runaway response.
By the end of this chapter, you will have a fully functional agent that can read and write files, execute shell commands, look up weather, manage multiple sessions, require confirmation for dangerous operations, and be interrupted at any time.
Architecture Overview
The agent is organized into four modules, each with a clear responsibility:
How the Pieces Fit Together
Here is how the major components interact at runtime:
The data flow follows a clean pipeline:
- User input enters via the readline REPL in
index.ts - Slash commands (like
/sessions,/new) are intercepted byhandleCommand()incommands.tsand handled without involving the LLM - Regular prompts are forwarded to
AgentRuntime.prompt(), which sends them to the underlyingAgentSession - Streaming responses flow back through the event subscription, through the
DeltaBatcher, and intostdout - Tool calls are dispatched to tool definitions in
tools.ts, some of which require confirmation via theConfirmationWaiter - Ctrl+C triggers the SIGINT handler, which calls
runtime.abort()to cancel any in-progress generation
Features
Commands
Deep Dive: DeltaBatcher
The Problem
When an LLM streams a response, it emits text in tiny fragments -- often just one or two characters at a time. If you write each fragment directly to the terminal with process.stdout.write(), you get visible character-by-character stuttering:
This is distracting and makes the agent feel slow, even though the total time to complete the response is the same. The issue is purely perceptual: humans read in chunks, not characters.
The Solution
DeltaBatcher is a small utility that collects incoming text fragments and flushes them in batches on a fixed interval. Instead of writing "H", "e", "l", "l", "o" as five separate operations, it waits a short period and writes "Hello" as a single operation.
The result is smooth, fluid text output that feels like a human typing at a natural pace.
Why 32 Milliseconds?
The default batch interval is 32ms, and this number is not arbitrary. Here is the reasoning:
- 16ms is one frame at 60fps -- the threshold below which humans cannot perceive individual updates. This would be smooth but produces very small batches (often still just 1-2 characters).
- 32ms (two frames at 60fps) is the sweet spot. It collects enough characters per batch to produce readable chunks (typically 3-10 characters depending on model speed), while still being fast enough that the output feels real-time.
- 100ms+ would produce noticeably chunky output -- the text would appear in bursts rather than flowing.
The 32ms interval also has a practical benefit: it roughly aligns with the typical token generation interval of fast LLM providers. Most models emit tokens every 20-50ms, so a 32ms batch interval usually collects one full token per flush -- exactly the right granularity for readable output.
The batch interval is configurable via the constructor. If you are building an agent for a slower connection or model, try increasing it to 50-100ms. For a local model that generates tokens very quickly, you might decrease it to 16ms. Tune it based on your use case.
How DeltaBatcher Integrates
In the AgentRuntime, the batcher sits between the event subscription and the terminal:
The flush() method is called when the response completes, ensuring any remaining buffered text is written out immediately. Without this, the last few characters of a response might be stuck in the buffer.
Deep Dive: The Abort Pattern
Why Abort Matters
Sometimes the agent goes off the rails -- it starts writing a 2,000-word essay when you wanted a one-line answer, or it begins executing a sequence of shell commands you did not intend. The user needs an escape hatch.
In a CLI application, the universal escape hatch is Ctrl+C (SIGINT). When the user presses Ctrl+C, we want to:
- Stop the LLM from generating more text -- cancel the API request
- Stop any in-progress tool execution -- abort shell commands, file operations, etc.
- Return control to the REPL -- so the user can type a new message or command
- Preserve the session -- the conversation up to the abort point should be saved
How It Works
Inside AgentRuntime, abort() calls session.abort() on the underlying AgentSession. This triggers a cancellation cascade:
- The API request is cancelled via an
AbortControllersignal - Tool execution receives the abort through the
signalparameter inexecute(toolCallId, params, signal, onUpdate)-- well-behaved tools checksignal.abortedperiodically - The session emits an abort event that the event subscription can handle
- The DeltaBatcher is flushed to write out any remaining buffered text
Ctrl+C in Node.js sends SIGINT to the process, which by default terminates it. By registering a handler with process.on('SIGINT', ...), we intercept the signal and perform a graceful abort instead. However, pressing Ctrl+C twice rapidly will usually force-kill the process (depending on the platform). This is intentional -- it gives the user a way out if the graceful abort hangs.
Abort and Tool Execution
For tools that perform long-running operations (like shell commands or API calls), the abort signal is propagated through the signal parameter:
AgentRuntime
The AgentRuntime class is the central coordinator. It encapsulates all session lifecycle management, confirmation handling, and streaming into a single cohesive API:
Why a Runtime Class?
You might wonder why we introduced a class instead of keeping everything as loose functions (like in the earlier chapters). The answer is state management. The runtime needs to track:
- The current
AgentSessioninstance (which changes on session switch) - The
DeltaBatcherinstance (which needs flushing on abort and session switch) - The confirmation waiter (which is shared across tools)
- The
SessionManager(which changes on session switch) - Whether a prompt is currently in-progress (to prevent double-prompting)
A class encapsulates this mutable state behind a clean API, preventing the caller from accidentally corrupting it. The alternative -- a bag of global variables -- becomes unmaintainable as complexity grows.
RuntimeConfig
The includeCodingTools flag deserves special mention. When set to true, the runtime includes pi-coding-agent's built-in coding tools: read, write, edit, and bash. These tools give the agent the ability to read and modify files on disk and execute shell commands -- powerful capabilities that effectively turn the agent into a coding assistant.
Enabling coding tools gives the agent filesystem and shell access. This is powerful but potentially dangerous. Always combine includeCodingTools: true with the confirmation pattern for destructive operations, and consider restricting the working directory.
Main Entry Point
Notice how clean the entry point is. All the complexity of session management, streaming, confirmation, and abort handling is hidden inside AgentRuntime. The entry point only needs to:
- Create the runtime with configuration
- Set up the SIGINT handler
- Run the REPL loop
This is the payoff of good encapsulation -- the top-level code reads like a description of what the application does, not how it does it.
Command Handler
The command handler is extracted into a separate module for testability and separation of concerns. It takes user input and the runtime, and returns true if the input was a command (so the REPL knows not to send it to the LLM):
The handleCommand() function returns a boolean to support the "intercept" pattern: if the input is a command, handle it and return true so the caller skips the prompt step. If it is not a command, return false so the caller knows to send it to the LLM. This clean separation means neither the REPL loop nor the command handler need to know about each other's internals.
Run
Try It
Try pressing Ctrl+C during a long response to see the abort in action. The agent stops immediately, and you can type a new message.
Where to Go From Here
Congratulations -- you have built a production-quality CLI agent from scratch. Here are some ideas for extending it further:
Extension Ideas
Add more tools: The agent's capabilities are limited only by the tools you provide. Consider adding tools for:
- Web browsing (fetch and summarize URLs)
- Database queries (read from SQLite, PostgreSQL)
- API integrations (GitHub, Jira, Slack)
- Image generation or analysis
Implement conversation branching: Allow users to "fork" a conversation at any point, creating a new session that starts from the current history. This is useful for exploring alternative approaches to a problem.
Add a tool approval allowlist: Instead of confirming every dangerous tool call, maintain a per-session allowlist of approved operations. Once a user approves "delete files in /tmp," auto-approve subsequent deletions in that directory.
Build a web UI: Replace the readline REPL with a web interface using React. The AgentRuntime class is already UI-agnostic -- you just need to swap the input/output layer.
Add cost tracking: Track token usage per session and display costs. This helps users stay within budget and identify prompts that are unexpectedly expensive.
Implement context window management: When conversations get long, older messages may need to be summarized or evicted to stay within the model's context window. Implement a strategy for this (e.g., summarize messages older than N turns).
Add MCP (Model Context Protocol) support: Connect your agent to external tool servers using the MCP protocol, giving it access to tools hosted by other applications.
Architectural Lessons
As you extend the agent, keep these principles in mind:
-
Separate concerns into modules: The four-file architecture (index, runtime, tools, commands) scales well. Add new modules for new concerns (e.g.,
cost.tsfor token tracking,context.tsfor context management). -
Use dependency injection: Tools receive their dependencies (like the confirmation waiter) as constructor arguments. This makes them testable and reusable across different agent configurations.
-
Always clean up resources: Every session switch needs
dispose(), every SIGINT handler needsabort(), every stdin listener needscleanup(). Resource leaks are the most common source of bugs in long-running agent applications. -
Buffer your output: The DeltaBatcher pattern applies anywhere you stream text to a UI. Whether it is a terminal, a web page, or a mobile app, batching produces a better user experience than character-by-character rendering.
-
Make abort a first-class citizen: Users will Ctrl+C. Plan for it. Every long-running tool should accept and respect the abort signal.
Key Takeaways
-
Composition is the key skill: Building a production agent is not about learning one new concept -- it is about composing all the concepts from previous chapters into a coherent whole. The
AgentRuntimeclass is the glue that holds everything together. -
DeltaBatcher solves perceptual latency: The 32ms batching interval transforms character-by-character stuttering into smooth, readable output. Small details like this make the difference between a prototype and a product.
-
Abort handling is non-negotiable: Users must always be able to interrupt a runaway agent. The SIGINT handler plus
session.abort()provides this escape hatch. -
Good architecture enables extension: The four-module structure (entry, runtime, tools, commands) makes it easy to add new features without touching existing code. Each module has a single responsibility and a clean interface.
-
This is a foundation, not a ceiling: The agent you built in this chapter is a starting point. The patterns you have learned -- streaming, tools, sessions, confirmation, skills, abort -- are the building blocks for any AI agent application, whether it is a CLI tool, a desktop app, or a cloud service.