Summarization — Architecture
This content was migrated from Documentation/SUMMARIZATION_SYSTEM.md and
restructured into audience sections. Review for accuracy against
the current codebase.
Context and Purpose
Long conversations increase token cost and latency linearly. The Summarization System caps this growth by compressing older messages into a fixed-size summary while keeping recent exchanges verbatim.
Driving requirements:
- Bounded context size — Token usage must not grow unboundedly with conversation length
- No information cliff — Summarization must preserve key decisions, facts, and unresolved items rather than truncating
- Computation-only node — The summarization node must not write to the database directly; persistence happens atomically at end-of-turn
Architecture Overview
flowchart TD
subgraph Entry["Session Start"]
SI[Session Init] --> CHECK{Summary exists?}
CHECK -->|yes| SMART["Load: summary + last 4 msgs"]
CHECK -->|no| FULL["Load: all messages"]
end
subgraph Trigger["Summarization Check"]
SMART --> SC[Summarization Check]
FULL --> SC
SC --> NEED{">20 msgs OR >4000 tokens?"}
NEED -->|yes| SUM[Summarization Node]
NEED -->|no| CL[Context Loader]
end
subgraph Summarize["Summarization"]
SUM -->|LLM call| GEN[Generate Summary]
GEN --> TITLE[Regenerate Title]
TITLE --> CL
end
subgraph Persist["End of Turn"]
MP[Message Persist Node] -->|atomic write| DB[(PostgreSQL)]
end
Component Responsibilities
| Component |
Responsibility |
| Session Init |
Smart-loads chat history: summary + last 4 messages if summary exists, all messages otherwise |
| Summarization Check |
Evaluates message count and token estimate against thresholds. Sets needs_summarization flag |
| Summarization Node |
Generates summary via LLM (keeps last 4 messages verbatim). Regenerates chat title. Computation-only — no DB writes |
| Message Persist Node |
Writes summary and title to database atomically at end of turn |
Data Model
| Field |
Location |
Purpose |
conversation_summary |
chats table + state |
Compressed conversation history |
needs_summarization |
state only |
Boolean flag set by check node |
summarization_occurred |
state only |
Flag for message_persist_node to know a write is needed |
chat_title |
chats table + state |
Regenerated title |
Key Design Decisions
1. Computation-Only Node
- Chosen: Summarization node only computes — all DB writes happen in
message_persist_node
- Rejected: Writing summary directly to DB in the summarization node
- Rationale: Atomic persistence at end-of-turn avoids partial writes if the graph fails after summarization but before response generation
2. Smart Loading Over Full Loading
- Chosen: When a summary exists, load only summary + last 4 messages
- Rejected: Always load all messages and summarize in-place
- Rationale: Loading 30 messages just to discard 26 wastes database reads. Smart loading reduces DB I/O significantly for long conversations
3. Iterative Summarization
- Chosen: New summaries incorporate the previous summary content
- Rejected: Re-summarizing from scratch each time
- Rationale: Ensures context from very early in the conversation is preserved across multiple summarization cycles
Interfaces and Contracts
| Interface |
Direction |
Format |
Consumer |
| Session Init → Summarization Check |
Outbound |
messages_history + conversation_summary in state |
Summarization Check node |
| Summarization → State |
Outbound |
conversation_summary, chat_title, summarization_occurred |
All downstream nodes, Message Persist |
| Summarization → LLM |
Outbound |
Structured output via llm_adapter.get_structured_output() |
DeepSeek v32 (agent_type: conversation_summarization) |
| Title Generation → LLM |
Outbound |
Via chat_service.generate_title_from_summary() |
Llama 4 Maverick (agent_type: title_generation) |
Known Trade-offs and Debt
| Item |
Impact |
Remediation |
| Token estimation heuristic |
Uses ~4 chars/token, inaccurate for non-Latin scripts or code-heavy conversations |
Use a proper tokenizer (tiktoken or model-specific) for accurate counts |
| No summary quality validation |
No check that the generated summary actually captures key information |
Add a summary quality score or user feedback mechanism |