Compaction

Compaction and branch summarization manage context window limits by summarizing older conversation history, keeping the agent productive during long sessions.

Overview

FeatureCompactionBranch Summarization
TriggerContext tokens exceed thresholdNavigating to a different branch via /tree
What is summarizedOlder messages in the current branchMessages in the abandoned branch
ResultCompactionEntry in sessionBranchSummaryEntry in session
LLM seesSummary + recent messagesSummary + messages in new branch
Manual trigger/compact [instructions]/tree with summarization option

When Compaction Triggers

Auto-compaction triggers when:

contextTokens > contextWindow - reserveTokens

Where:

  • contextTokens is calculated from the last assistant message's usage field
  • contextWindow is the model's context window size
  • reserveTokens is the configured reserve (default: 16,384 tokens)

This ensures there is always room for the next prompt and response.

Manual Trigger

/compact [optional instructions]

The optional instructions focus the summary on specific aspects. For example:

/compact Focus on the database schema changes and migration steps

How Compaction Works

Step 1: Find the Cut Point

Walk backwards from the newest entry, accumulating estimated token counts. Stop when keepRecentTokens (default: 16,384) worth of content has been accumulated. The cut point is where older messages will be summarized.

Before compaction (all entries in current branch):

  [user-1] [asst-1] [tool-1] [user-2] [asst-2] [tool-2] [user-3] [asst-3] [tool-3] [user-4] [asst-4]
  |<--------- summarized ----------->|<------------ kept (keepRecentTokens) --------->|
                                      ^
                                   cut point

Step 2: Generate the Summary

The messages before the cut point are serialized and sent to the LLM with a summarization prompt. The LLM produces a structured summary.

If there is an existing compaction summary from a previous compaction, it is included as context for an iterative update, so information accumulates across compactions.

Step 3: Append CompactionEntry

A CompactionEntry is appended to the session with the summary text, the ID of the first kept entry, the token count before compaction, and file tracking details.

Step 4: Rebuild Context

After compaction, buildSessionContext() uses the compaction entry:

What the LLM sees after compaction:

  [CompactionSummaryMessage: "## Goal..."] [user-3] [asst-3] [tool-3] [user-4] [asst-4]
  |<-- from compaction summary ---------->|<-- kept entries --------------------------->|

The compaction summary is injected as a user message with the prefix:

The conversation history before this point was compacted into the following summary:

<summary>
...summary content...
</summary>

Step 5: Continue the Session

The agent continues with the reduced context. Future messages are appended as normal. When context fills up again, another compaction occurs, updating the previous summary.

Split Turns

The cut point can fall in the middle of a turn (between an assistant message with tool calls and its tool results). In this case, a "split turn" occurs:

Split turn scenario:

  [user-1] [asst-1] [tool-1a] [tool-1b] [user-2] [asst-2] [tool-2]
                      ^
               cut point falls here

Result:
  - Summarized: [user-1] [asst-1] (the full turn including tool calls)
  - Turn prefix: [tool-1a] [tool-1b] (orphaned tool results get their own summary)
  - Kept: [user-2] [asst-2] [tool-2]

When a turn is split, the orphaned tool results are summarized separately and combined into the compaction summary. This ensures tool results are never left without their corresponding tool calls.

Cut Point Rules

  • Cut points can be at user messages or assistant messages (never tool results)
  • When cutting at an assistant message with tool calls, the tool results that follow are kept
  • Tool result messages are never valid cut points because they would be orphaned from their tool calls
  • The algorithm always keeps at least keepRecentTokens worth of content

CompactionEntry

interface CompactionEntry<T = unknown> extends SessionEntryBase {
  type: "compaction";
  summary: string; // Structured summary text
  firstKeptEntryId: string; // UUID of first entry kept after compaction
  tokensBefore: number; // Total context tokens before compaction
  details?: T; // Extension-specific data
  fromHook?: boolean; // True if generated by extension
}

// Default details for pi-generated compaction
interface CompactionDetails {
  readFiles: string[]; // Files only read (not modified)
  modifiedFiles: string[]; // Files written or edited
}

Branch Summarization

When It Triggers

Branch summarization occurs when navigating to a different point in the session tree via /tree and the user chooses to summarize the abandoned branch.

How It Works

Step 1: Collect Entries

Walk from the current leaf (the branch being abandoned) back to the common ancestor with the target position. All entries along this path are collected for summarization.

Session tree before navigation:

         [root]
         /    \
     [A1]     [B1]
      |         |
     [A2]      [B2]
      |         |
     [A3]      [B3] <-- target
      |
     [A4] <-- current leaf (being abandoned)

Entries to summarize: [A4, A3, A2, A1] (from leaf back to common ancestor [root])

Step 2: Prepare with Token Budget

Walk entries from newest to oldest, adding messages until the token budget is reached. This ensures the most recent context is preserved when the branch is too long for the summarization model's context window.

Step 3: Extract File Operations

Collect file operations from:

  • Tool calls in assistant messages (read, write, edit tools)
  • Existing BranchSummaryEntry details (for cumulative tracking across multiple navigations)

Step 4: Generate Summary

The collected messages are serialized and sent to the LLM with a summarization prompt. The LLM produces a structured summary.

Step 5: Append BranchSummaryEntry

A BranchSummaryEntry is appended to the session at the branch point (the target position).

Session tree after navigation:

         [root]
         /    \
     [A1]     [B1]
      |         |
     [A2]      [B2]
      |         |
     [A3]      [B3] <-- target
      |         |
     [A4]      [BranchSummary of A1->A4] <-- appended here
                |
               (new leaf, ready for new messages)

Cumulative File Tracking

File operations are tracked cumulatively across branch summaries. When summarizing a branch that itself contains BranchSummaryEntry entries, the file operations from those entries are merged with the new file operations. This ensures complete file tracking even across multiple tree navigations.

BranchSummaryEntry

interface BranchSummaryEntry<T = unknown> extends SessionEntryBase {
  type: "branch_summary";
  fromId: string; // Entry ID where the branch was abandoned
  summary: string; // Structured summary text
  details?: T; // Extension-specific data
  fromHook?: boolean; // True if generated by extension
}

// Default details for pi-generated branch summaries
interface BranchSummaryDetails {
  readFiles: string[]; // Files only read (not modified)
  modifiedFiles: string[]; // Files written or edited
}

Summary Format

Both compaction and branch summaries follow a structured format:

## Goal

What the conversation was trying to accomplish.

## Constraints

Any constraints or requirements mentioned.

## Progress

- Step 1 completed
- Step 2 in progress

## Key Decisions

- Chose REST over GraphQL because...
- Using PostgreSQL for...

## Next Steps

- Implement authentication
- Add error handling

## Critical Context

Important information needed going forward.

<read-files>
src/config.ts
src/utils/helpers.ts
</read-files>

<modified-files>
src/api/routes.ts
src/db/schema.ts
</modified-files>

Message Serialization

Before summarization, messages are serialized to a text format using serializeConversation(). This prevents the summarization model from treating the content as a conversation to continue:

import {
  serializeConversation,
  convertToLlm,
} from "@mariozechner/pi-coding-agent";

// Convert AgentMessages to LLM messages first
const llmMessages = convertToLlm(agentMessages);
// Then serialize to text
const text = serializeConversation(llmMessages);

The serialized format uses clear role markers ([USER], [ASSISTANT], [TOOL_RESULT]) and separates messages with dividers.

Custom Summarization via Extensions

session_before_compact

Extensions can intercept compaction and provide custom summaries:

import {
  serializeConversation,
  convertToLlm,
} from "@mariozechner/pi-coding-agent";

pi.on("session_before_compact", async (event) => {
  // event.preparation contains all the data for compaction
  const {
    messagesToSummarize,
    turnPrefixMessages,
    firstKeptEntryId,
    tokensBefore,
    previousSummary,
    fileOps,
    settings,
  } = event.preparation;

  // Serialize messages for your custom summarizer
  const llmMessages = convertToLlm(messagesToSummarize);
  const serialized = serializeConversation(llmMessages);

  // Generate your own summary (e.g., with a different model or prompt)
  const customSummary = await myCustomSummarizer(serialized, previousSummary);

  // Provide the result back to pi
  event.setResult({
    summary: customSummary,
    firstKeptEntryId,
    tokensBefore,
    details: {
      // Your custom details (stored in CompactionEntry.details)
      artifactIndex: myArtifactIndex,
      version: "2.0",
    },
  });
});

session_before_tree

Extensions can intercept branch summarization:

pi.on("session_before_tree", async (event) => {
  // Customize summarization behavior
  event.setSummarize(true);
  event.setCustomInstructions("Focus on API contract changes");
  event.setReplaceInstructions(false); // Append to default prompt

  // Or cancel navigation entirely
  // event.cancel();
});

Settings

Compaction behavior is controlled via settings.jsonl:

{
  "compaction": {
    "enabled": true,
    "reserveTokens": 16384,
    "keepRecentTokens": 16384
  }
}
SettingDefaultDescription
compaction.enabledtrueEnable/disable auto-compaction
compaction.reserveTokens16384Tokens reserved for prompt and response. Compaction triggers when contextTokens > contextWindow - reserveTokens.
compaction.keepRecentTokens16384Minimum tokens to keep after compaction. The cut point algorithm walks backwards, keeping at least this many tokens of recent content.

You can also toggle auto-compaction at runtime:

session.setAutoCompactionEnabled(false); // Disable
session.setAutoCompactionEnabled(true); // Enable

Or manually trigger compaction:

const result = await session.compact("Focus on authentication changes");
// result: { summary, firstKeptEntryId, tokensBefore, details }