Context

Smart context management for long conversations with automatic compression Learn the setup patterns, APIs, and practical examples needed to build reliable...

Smart context management for long conversations with automatic compression

Overview

Auto context compression in Astreus provides intelligent conversation management by automatically handling long conversation histories. The system compresses older messages while preserving important information, ensuring agents can maintain coherent long conversations without exceeding model token limits.

Basic Usage

Enable auto context compression to get automatic conversation management:

import { Agent } from '@astreus-ai/astreus';

// Create an agent with auto context compression enabled
const agent = await Agent.create({
  name: 'ContextAwareAgent',
  model: 'gpt-4o',
  autoContextCompression: true  // Enable smart context management
});

// Long conversations are automatically managed
for (let i = 1; i <= 50; i++) {
  const response = await agent.ask(`Tell me fact #${i} about TypeScript`);
  console.log(`Fact ${i}:`, response);
}

// Agent can still reference early conversation through compressed context
const summary = await agent.ask('What was the first fact you told me?');
console.log(summary); // System retrieves from compressed context

Example with Tasks

Auto context compression works with both direct conversations and tasks:

const agent = await Agent.create({
  name: 'ResearchAgent',
  model: 'gpt-4o',
  autoContextCompression: true,
  memory: true // Often used together with memory
});

// Create multiple related tasks
const task1 = await agent.createTask({
  prompt: "Research the latest trends in AI development"
});

const result1 = await agent.executeTask(task1.id);

const task2 = await agent.createTask({
  prompt: "Based on the research, what are the key opportunities?"
});

const result2 = await agent.executeTask(task2.id);
// Task can reference previous context even if it was compressed

Auto context compression ensures agents can handle conversations and tasks of any length while maintaining coherence and staying within token limits.

Configuration Options

You can customize the auto context compression behavior with these parameters:

const agent = await Agent.create({
  name: 'CustomContextAgent',
  model: 'gpt-4o',
  autoContextCompression: true,
  
  // Context compression configuration
  maxContextLength: 4000,           // Trigger compression at 4000 tokens
  preserveLastN: 5,                 // Keep last 5 messages uncompressed
  compressionRatio: 0.4,            // Target 40% size reduction
  compressionStrategy: 'hybrid',    // Use hybrid compression strategy
  
  memory: true,
});

Configuration Parameters

Parameter	Type	Default	Description
`autoContextCompression`	`boolean`	`false`	Enable automatic context compression
`maxContextLength`	`number`	`8000`	Token limit before compression triggers
`preserveLastN`	`number`	`3`	Number of recent messages to keep uncompressed
`compressionRatio`	`number`	`0.3`	Target compression ratio (0.1 = 90% reduction)
`compressionStrategy`	`string`	`'hybrid'`	Compression algorithm to use

Compression Mathematics

The compression ratio determines how much the context is reduced:

$\text{Compression Ratio} = \frac{\text{compressed tokens}}{\text{original tokens}}$

For example, with a ratio of 0.3:

Original: 1000 tokens
Compressed: 300 tokens
Reduction: 70%

The token reduction percentage is calculated as: $\text{Reduction \%} = (1 - \text{ratio}) \times 100\%$

With compressionRatio = 0.3: $\text{Reduction} = (1 - 0.3) \times 100\% = 70\%$

Compression Strategies

Choose the compression strategy that best fits your use case:

`'summarize'` - Text Summarization

Best for: General conversations, Q&A, discussions
How it works: Creates concise summaries of message groups
Pros: Maintains context flow, good for most use cases
Cons: May lose specific details

const agent = await Agent.create({
  name: 'SummarizingAgent',
  autoContextCompression: true,
  compressionStrategy: 'summarize',
  preserveLastN: 4
});

`'selective'` - Important Message Selection

Best for: Task-oriented conversations, technical discussions
How it works: Uses AI to identify and preserve important messages
Pros: Keeps crucial information intact
Cons: May be more resource intensive

const agent = await Agent.create({
  name: 'SelectiveAgent',
  autoContextCompression: true,
  compressionStrategy: 'selective',
  preserveLastN: 3
});

`'hybrid'` - Combined Approach (Recommended)

Best for: Most applications, balanced approach
How it works: Combines summarization and selective preservation
Pros: Balanced between context preservation and efficiency
Cons: None significant

const agent = await Agent.create({
  name: 'HybridAgent',
  autoContextCompression: true,
  compressionStrategy: 'hybrid', // Default and recommended
});

Advanced Usage

Custom Compression Settings by Use Case

High-Frequency Conversations

For chatbots or interactive agents with many short messages:

const chatbot = await Agent.create({
  name: 'Chatbot',
  autoContextCompression: true,
  maxContextLength: 2000,     // Compress more frequently
  preserveLastN: 8,           // Keep more recent messages
  compressionRatio: 0.5,      // More aggressive compression
  compressionStrategy: 'summarize'
});

Long-Form Content Creation

For agents working with detailed content:

const writer = await Agent.create({
  name: 'ContentWriter',
  autoContextCompression: true,
  maxContextLength: 12000,    // Allow longer context
  preserveLastN: 3,           // Keep recent context tight
  compressionRatio: 0.2,      // Gentle compression
  compressionStrategy: 'selective'
});

Technical Documentation

For agents handling complex technical discussions:

const techAgent = await Agent.create({
  name: 'TechnicalAssistant',
  autoContextCompression: true,
  maxContextLength: 6000,
  preserveLastN: 5,           
  compressionRatio: 0.3,      
  compressionStrategy: 'hybrid' // Best for mixed content
});

How Context Compression Works

Compression Process

Token Monitoring: Agent continuously monitors total token count in conversation

Trigger Point: When tokens exceed maxContextLength, compression is triggered

Message Preservation: Recent preserveLastN messages are kept uncompressed

Content Analysis: Older messages are analyzed based on chosen strategy

Compression: Messages are compressed into summaries or selections

Context Update: Compressed context replaces original messages

What Gets Preserved

System prompts: Always preserved
Recent messages: Last N messages based on preserveLastN
Important context: Key information identified by the compression strategy
Compressed summaries: Condensed versions of older conversations

Example Compression Flow

// Before compression (1200 tokens)
[
  { role: 'user', content: 'Tell me about TypeScript' },
  { role: 'assistant', content: 'TypeScript is...' },
  { role: 'user', content: 'What about interfaces?' },
  { role: 'assistant', content: 'Interfaces in TypeScript...' },
  { role: 'user', content: 'Show me an example' },
  { role: 'assistant', content: 'Here\'s an example...' },
]

// After compression (400 tokens)
[
  { role: 'system', content: '[Compressed] User asked about TypeScript basics, interfaces, and examples. Assistant provided comprehensive explanations...' },
  { role: 'user', content: 'Show me an example' },
  { role: 'assistant', content: 'Here\'s an example...' },
]

Monitoring and Debugging

Context Window Information

Get details about the current context state:

const contextWindow = agent.getContextWindow();

console.log({
  messageCount: contextWindow.messages.length,
  totalTokens: contextWindow.totalTokens,
  maxTokens: contextWindow.maxTokens,
  utilization: `${contextWindow.utilizationPercentage.toFixed(1)}%`
});

// Check if compression occurred
const hasCompression = contextWindow.messages.some(
  msg => msg.metadata?.type === 'summary'
);
console.log('Context compressed:', hasCompression);

Context Analysis

Analyze context for optimization opportunities:

const analysis = agent.analyzeContext();

console.log({
  compressionNeeded: analysis.compressionNeeded,
  averageTokensPerMessage: analysis.averageTokensPerMessage,
  suggestedCompressionRatio: analysis.suggestedCompressionRatio
});

Response Types

Context management methods return detailed objects for monitoring and controlling conversation context.

Context Window Response

Get the current context window with utilization metrics:

const window = agent.getContextWindow();

// Response structure:
{
  messages: [
    {
      role: "user",
      content: "How do I use TypeScript?",
      timestamp: Date('2024-01-15T10:00:00Z'),
      tokens: 8
    },
    {
      role: "assistant",
      content: "TypeScript is a typed superset of JavaScript that compiles to plain JavaScript...",
      timestamp: Date('2024-01-15T10:00:05Z'),
      tokens: 50
    }
    // ... more messages
  ],
  totalTokens: 3500,
  maxTokens: 8000,
  utilizationPercentage: 43.75
}

Context Analysis Response

Analyze current context usage and compression needs:

const analysis = agent.analyzeContext();

// Response structure:
{
  totalTokens: 6500,
  messageCount: 15,
  averageTokensPerMessage: 433,
  contextUtilization: 0.8125,              // 81.25% of max context used
  compressionNeeded: true,
  suggestedCompressionRatio: 0.5           // Suggest 50% compression
}

Compression Result Response

Compress context and get detailed compression metrics:

const compression = await agent.compressContext();

// Response structure:
{
  success: true,
  compressedMessages: [
    {
      role: "system",
      content: "Summary: User asked about TypeScript features. Discussed types, interfaces, and generics...",
      timestamp: Date('2024-01-15T10:05:00Z'),
      tokens: 35
    },
    {
      role: "user",
      content: "Can you explain decorators?",
      timestamp: Date('2024-01-15T10:10:00Z'),
      tokens: 8
    }
    // ... compressed messages (8 instead of 15)
  ],
  tokensReduced: 3250,                     // Tokens saved
  compressionRatio: 0.5,                   // 50% reduction achieved
  strategy: "summarize"                    // Strategy used
}

// On failure:
{
  success: false,
  compressedMessages: [],
  tokensReduced: 0,
  compressionRatio: 0,
  error: "Compression failed: Minimum context threshold not reached"
}

Context Summary Response

Generate an AI-powered summary of the conversation:

const summary = await agent.generateContextSummary();

// Response structure:
{
  mainTopics: [
    "TypeScript development",
    "API design patterns",
    "Testing strategies"
  ],
  keyEntities: [
    "Express.js",
    "Jest",
    "PostgreSQL",
    "Docker"
  ],
  conversationFlow: "Discussion started with TypeScript setup and configuration. Moved to API design patterns using Express.js. Covered database integration with PostgreSQL. Concluded with comprehensive testing strategies using Jest and continuous integration.",
  importantFacts: [
    "User prefers functional programming style",
    "Project deadline is March 15th, 2024",
    "Must support Node.js 18+",
    "Team size is 5 developers"
  ],
  actionItems: [
    "Set up Jest test framework with coverage reporting",
    "Create API documentation using OpenAPI/Swagger",
    "Configure Docker containers for development environment",
    "Implement CI/CD pipeline with GitHub Actions"
  ]
}

Get Context Messages Response

Retrieve all context messages as an array:

const messages = agent.getContext();
// OR
const messages = agent.getContextMessages();

// Response structure:
[
  {
    role: "user",
    content: "How do I use async/await?",
    timestamp: Date('2024-01-15T09:30:00Z'),
    tokens: 10
  },
  {
    role: "assistant",
    content: "Async/await is syntactic sugar for promises...",
    timestamp: Date('2024-01-15T09:30:15Z'),
    tokens: 85
  }
  // ... more messages
]

Export Context Response

Export context returns a JSON string:

const exported = agent.exportContext();

// Response: JSON string
'{"messages":[{"role":"user","content":"...","timestamp":"2024-01-15T10:00:00.000Z","tokens":10},...],"metadata":{"exportedAt":"2024-01-15T11:00:00.000Z","totalTokens":3500}}'

Import/Clear Context Response

Import and clear operations return void:

// Import context
agent.importContext(jsonString);
// Returns: void

// Clear context
agent.clearContext();
// Returns: void

Last updated: July 20, 2026

In this section

Intro

Open-source AI agent framework for building autonomous systems that solve real-world tasks effectively.

Install

Install Astreus with npm, yarn, or pnpm, confirm the required Node.js version, and prepare a local project for building AI agents with the framework.

Quickstart

Build your first AI agent with Astreus in under 2 minutes Learn the setup patterns, APIs, and practical examples needed to build reliable Astreus agent systems.

Agent

Core AI entity with modular capabilities and decorator-based composition Learn the setup patterns, APIs, and practical examples needed to build reliable...

Memory

Task