Skip to content

Summarization

The Summarization System compresses long conversations to keep Swisper fast and cost-efficient. When a conversation exceeds configurable thresholds (20 messages or ~4,000 tokens), it generates a concise summary of older messages, keeps the most recent exchanges verbatim, and regenerates the chat title to reflect the evolved topic.

The system uses smart loading — when a summary exists, only the summary and the last 4 messages are loaded from the database, avoiding unnecessary I/O for long conversations.

Key Components

Component Purpose
Summarization Check Evaluates message count and token estimates against thresholds
Summarization Node Generates summary via LLM and regenerates chat title (computation-only, no DB writes)
Smart Loading Loads only summary + recent messages when a summary exists, avoiding full history reads
Message Persist Writes summary and title to database atomically at end of turn

Documentation Sections

  • Overview — What this module does and who it serves
  • Architecture — System design, components, and trade-offs