Context Management

How BaseAgent manages memory and prevents token overflow

Why Context Management Matters

Large Language Models have finite context windows. Without proper management:

"Context too long" errors terminate sessions
Critical information gets lost
Response quality degrades
Costs increase unnecessarily

BaseAgent implements sophisticated context management inspired by OpenCode and Codex.

Context Window Overview

graph TB
    subgraph Window["Claude Opus 4.5 Context Window (200K tokens)"]
        Output["Reserved for Output<br/>32K tokens"]
        Usable["Usable Context<br/>168K tokens"]
    end
    
    subgraph Thresholds["Management Thresholds"]
        Safe["Safe Zone<br/>< 85% (143K)"]
        Warning["Warning Zone<br/>85-100%"]
        Overflow["Overflow<br/>> 168K"]
    end
    
    Usable --> Safe
    Usable --> Warning
    Usable --> Overflow
    
    style Safe fill:#4CAF50,color:#fff
    style Warning fill:#FF9800,color:#fff
    style Overflow fill:#F44336,color:#fff

Key Numbers

Metric	Value	Description
Total context	200,000	Model's full context window
Output reserve	32,000	Reserved for LLM response
Usable context	168,000	Available for messages
Compaction threshold	85%	Trigger at 142,800 tokens
Prune protect	40,000	Recent tool output to keep
Prune minimum	20,000	Minimum savings to prune

Token Estimation

BaseAgent estimates tokens using a simple heuristic:

# 1 token ≈ 4 characters
def estimate_tokens(text: str) -> int:
    return len(text) // 4

Message Token Components

graph LR
    subgraph Message["Message Token Estimation"]
        Content["Content<br/>(text / 4)"]
        Images["Images<br/>(~1000 each)"]
        ToolCalls["Tool Calls<br/>(name + args)"]
        Overhead["Role Overhead<br/>(~4 tokens)"]
    end
    
    Content --> Total["Total Tokens"]
    Images --> Total
    ToolCalls --> Total
    Overhead --> Total

Context Management Pipeline

flowchart TB
    subgraph Input["Every Iteration"]
        Messages["Current Messages"]
    end
    
    subgraph Detection["1. Detection"]
        Estimate["Estimate Total Tokens"]
        Check{"Above 85%<br/>Threshold?"}
    end
    
    subgraph Pruning["2. Pruning (First Pass)"]
        Scan["Scan Backwards"]
        Protect["Protect Last 40K<br/>Tool Output Tokens"]
        Clear["Clear Old Tool Outputs"]
        CheckAgain{"Still Above<br/>Threshold?"}
    end
    
    subgraph Compaction["3. AI Compaction (Second Pass)"]
        Summary["Generate Summary<br/>via LLM"]
        Rebuild["Rebuild Messages:<br/>System + Summary"]
    end
    
    subgraph Output["Continue Loop"]
        Managed["Managed Messages"]
    end
    
    Messages --> Estimate --> Check
    Check -->|No| Managed
    Check -->|Yes| Scan --> Protect --> Clear --> CheckAgain
    CheckAgain -->|No| Managed
    CheckAgain -->|Yes| Summary --> Rebuild --> Managed
    
    style Pruning fill:#FF9800,color:#fff
    style Compaction fill:#9C27B0,color:#fff

Stage 1: Tool Output Pruning

The first defense against context overflow is pruning old tool outputs.

Strategy

Scan messages backwards (most recent first)
Skip the first 2 user turns (most recent)
Accumulate tool output tokens
After 40K tokens accumulated, mark older outputs for pruning
Only prune if savings exceed 20K tokens

Implementation

def prune_old_tool_outputs(messages, protect_last_turns=2):
    total = 0  # Total tool output tokens seen
    pruned = 0  # Tokens to be pruned
    to_prune = []
    turns = 0
    
    for i in range(len(messages) - 1, -1, -1):
        msg = messages[i]
        
        if msg["role"] == "user":
            turns += 1
        
        if turns < protect_last_turns:
            continue
        
        if msg["role"] == "tool":
            content = msg.get("content", "")
            estimate = len(content) // 4
            total += estimate
            
            if total > PRUNE_PROTECT:  # 40K
                pruned += estimate
                to_prune.append(i)
    
    if pruned > PRUNE_MINIMUM:  # 20K
        # Replace content with marker
        for idx in to_prune:
            messages[idx]["content"] = "[Old tool result content cleared]"
    
    return messages

Visual Example

graph TB
    subgraph Before["Before Pruning (150K tokens)"]
        S1["System Prompt<br/>5K tokens"]
        U1["User Instruction<br/>1K tokens"]
        A1["Assistant + Tools<br/>10K tokens"]
        T1["Tool Results (old)<br/>50K tokens"]
        A2["Assistant + Tools<br/>10K tokens"]
        T2["Tool Results (old)<br/>40K tokens"]
        A3["Assistant + Tools<br/>10K tokens"]
        T3["Tool Results (recent)<br/>24K tokens"]
    end
    
    subgraph After["After Pruning (60K tokens)"]
        S2["System Prompt<br/>5K tokens"]
        U2["User Instruction<br/>1K tokens"]
        A4["Assistant + Tools<br/>10K tokens"]
        T4["[cleared]<br/>~0 tokens"]
        A5["Assistant + Tools<br/>10K tokens"]
        T5["[cleared]<br/>~0 tokens"]
        A6["Assistant + Tools<br/>10K tokens"]
        T6["Tool Results (protected)<br/>24K tokens"]
    end
    
    T1 -.-> T4
    T2 -.-> T5
    T3 --> T6
    
    style T4 fill:#FF9800,color:#fff
    style T5 fill:#FF9800,color:#fff
    style T6 fill:#4CAF50,color:#fff

Stage 2: AI Compaction

When pruning isn't enough, BaseAgent uses the LLM to summarize the conversation.

Compaction Process

sequenceDiagram
    participant Loop as Agent Loop
    participant Compact as Compaction
    participant LLM as LLM API

    Loop->>Compact: Context still too large
    Compact->>Compact: Add compaction prompt
    Compact->>LLM: Request summary
    LLM-->>Compact: Summary response
    Compact->>Compact: Build new messages
    Compact-->>Loop: [System, Summary]

Compaction Prompt

COMPACTION_PROMPT = """
You are performing a CONTEXT CHECKPOINT COMPACTION. 
Create a handoff summary for another LLM that will resume the task.

Include:
- Current progress and key decisions made
- Important context, constraints, or user preferences
- What remains to be done (clear next steps)
- Any critical data, examples, or references needed to continue
- Which files were modified and how
- Any errors encountered and how they were resolved

Be concise, structured, and focused on helping the next LLM 
seamlessly continue the work. Use bullet points and clear sections.
"""

Result

The compacted messages are:

compacted = [
    {"role": "system", "content": original_system_prompt},
    {"role": "user", "content": SUMMARY_PREFIX + llm_summary},
]

Summary Prefix

SUMMARY_PREFIX = """
Another language model started to solve this problem and produced 
a summary of its thinking process. You also have access to the state 
of the tools that were used. Use this to build on the work that has 
already been done and avoid duplicating work.

Here is the summary from the previous context:

"""

Middle-Out Truncation

For individual tool outputs, BaseAgent uses middle-out truncation:

graph LR
    subgraph Original["Original Output"]
        O1["Start<br/>(headers, definitions)"]
        O2["Middle<br/>(repetitive data)"]
        O3["End<br/>(results, errors)"]
    end
    
    subgraph Truncated["Truncated Output"]
        T1["Start<br/>(preserved)"]
        T2["[...truncated...]"]
        T3["End<br/>(preserved)"]
    end
    
    O1 --> T1
    O2 -.-> T2
    O3 --> T3
    
    style O2 fill:#FF9800,color:#fff
    style T2 fill:#FF9800,color:#fff

Implementation

def middle_out_truncate(text: str, max_tokens: int = 2500) -> str:
    max_chars = max_tokens * 4  # 4 chars per token
    
    if len(text) <= max_chars:
        return text
    
    keep = max_chars // 2 - 50  # Room for marker
    return f"{text[:keep]}\n\n[...truncated...]\n\n{text[-keep:]}"

Why Middle-Out?

Section	Contains	Value
Start	Headers, imports, definitions	High
Middle	Repetitive data, logs	Low
End	Results, errors, summaries	High

Configuration Options

Setting	Default	Description
`model_context_limit`	200,000	Total context window
`output_token_max`	32,000	Reserved for output
`auto_compact_threshold`	0.85	Trigger threshold
`prune_protect`	40,000	Recent tool tokens to keep
`prune_minimum`	20,000	Minimum savings to prune
`max_output_tokens`	2,500	Per-tool output limit

Tuning Guidelines

For Long Tasks:

"auto_compact_threshold": 0.70,  # More aggressive
"prune_protect": 30_000,          # Protect less

For Complex Tasks (need more context):

"auto_compact_threshold": 0.90,  # Less aggressive
"prune_protect": 60_000,          # Protect more

Monitoring Context Usage

BaseAgent logs context status each iteration:

[14:30:16] [compaction] Context: 45000 tokens (26.8% of 168000)
[14:35:22] [compaction] Context: 125000 tokens (74.4% of 168000)
[14:38:45] [compaction] Context: 148000 tokens (88.1% of 168000)
[14:38:45] [compaction] Context overflow detected, managing...
[14:38:45] [compaction] Prune scan: 95000 total tokens, 55000 prunable
[14:38:45] [compaction] Pruning 12 tool outputs, recovering ~55000 tokens
[14:38:46] [compaction] Pruning sufficient: 148000 -> 93000 tokens

Best Practices

1. Keep Tool Outputs Focused

# ❌ Too much output
ls -laR /  # Lists entire filesystem

# ✅ Targeted
ls -la /workspace/src/  # Just what's needed

2. Use Appropriate Search Patterns

# ❌ Too broad
grep "function"  # Matches everything

# ✅ Specific
grep "def calculate_total" src/billing.py

3. Read Sections, Not Entire Files

// ❌ Entire large file
{"name": "read_file", "arguments": {"file_path": "huge.py"}}

// ✅ Specific section
{"name": "read_file", "arguments": {"file_path": "huge.py", "offset": 100, "limit": 50}}

4. Monitor Long Sessions

For tasks exceeding 50 iterations, watch for:

Repeated compaction events
Context oscillating near threshold
Loss of important context after compaction

Next Steps

Best Practices - Optimization strategies
Configuration - Tuning options
Architecture - System design

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Context Management

Why Context Management Matters

Context Window Overview

Key Numbers

Token Estimation

Message Token Components

Context Management Pipeline

Stage 1: Tool Output Pruning

Strategy

Implementation

Visual Example

Stage 2: AI Compaction

Compaction Process

Compaction Prompt

Result

Summary Prefix

Middle-Out Truncation

Implementation

Why Middle-Out?

Configuration Options

Tuning Guidelines

Monitoring Context Usage

Best Practices

1. Keep Tool Outputs Focused

2. Use Appropriate Search Patterns

3. Read Sections, Not Entire Files

4. Monitor Long Sessions

Next Steps

FilesExpand file tree

context-management.md

Latest commit

History

context-management.md

File metadata and controls

Context Management

Why Context Management Matters

Context Window Overview

Key Numbers

Token Estimation

Message Token Components

Context Management Pipeline

Stage 1: Tool Output Pruning

Strategy

Implementation

Visual Example

Stage 2: AI Compaction

Compaction Process

Compaction Prompt

Result

Summary Prefix

Middle-Out Truncation

Implementation

Why Middle-Out?

Configuration Options

Tuning Guidelines

Monitoring Context Usage

Best Practices

1. Keep Tool Outputs Focused

2. Use Appropriate Search Patterns

3. Read Sections, Not Entire Files

4. Monitor Long Sessions

Next Steps