Skip to content

Summarization — Architecture

This content was migrated from Documentation/SUMMARIZATION_SYSTEM.md and restructured into audience sections. Review for accuracy against the current codebase.

Context and Purpose

Long conversations increase token cost and latency linearly. The Summarization System caps this growth by compressing older messages into a fixed-size summary while keeping recent exchanges verbatim.

Driving requirements:

  • Bounded context size — Token usage must not grow unboundedly with conversation length
  • No information cliff — Summarization must preserve key decisions, facts, and unresolved items rather than truncating
  • Computation-only node — The summarization node must not write to the database directly; persistence happens atomically at end-of-turn

Architecture Overview

flowchart TD
    subgraph Entry["Session Start"]
        SI[Session Init] --> CHECK{Summary exists?}
        CHECK -->|yes| SMART["Load: summary + last 4 msgs"]
        CHECK -->|no| FULL["Load: all messages"]
    end

    subgraph Trigger["Summarization Check"]
        SMART --> SC[Summarization Check]
        FULL --> SC
        SC --> NEED{">20 msgs OR >4000 tokens?"}
        NEED -->|yes| SUM[Summarization Node]
        NEED -->|no| CL[Context Loader]
    end

    subgraph Summarize["Summarization"]
        SUM -->|LLM call| GEN[Generate Summary]
        GEN --> TITLE[Regenerate Title]
        TITLE --> CL
    end

    subgraph Persist["End of Turn"]
        MP[Message Persist Node] -->|atomic write| DB[(PostgreSQL)]
    end

Component Responsibilities

Component Responsibility
Session Init Smart-loads chat history: summary + last 4 messages if summary exists, all messages otherwise
Summarization Check Evaluates message count and token estimate against thresholds. Sets needs_summarization flag
Summarization Node Generates summary via LLM (keeps last 4 messages verbatim). Regenerates chat title. Computation-only — no DB writes
Message Persist Node Writes summary and title to database atomically at end of turn

Data Model

Field Location Purpose
conversation_summary chats table + state Compressed conversation history
needs_summarization state only Boolean flag set by check node
summarization_occurred state only Flag for message_persist_node to know a write is needed
chat_title chats table + state Regenerated title

Key Design Decisions

1. Computation-Only Node

  • Chosen: Summarization node only computes — all DB writes happen in message_persist_node
  • Rejected: Writing summary directly to DB in the summarization node
  • Rationale: Atomic persistence at end-of-turn avoids partial writes if the graph fails after summarization but before response generation

2. Smart Loading Over Full Loading

  • Chosen: When a summary exists, load only summary + last 4 messages
  • Rejected: Always load all messages and summarize in-place
  • Rationale: Loading 30 messages just to discard 26 wastes database reads. Smart loading reduces DB I/O significantly for long conversations

3. Iterative Summarization

  • Chosen: New summaries incorporate the previous summary content
  • Rejected: Re-summarizing from scratch each time
  • Rationale: Ensures context from very early in the conversation is preserved across multiple summarization cycles

Interfaces and Contracts

Interface Direction Format Consumer
Session Init → Summarization Check Outbound messages_history + conversation_summary in state Summarization Check node
Summarization → State Outbound conversation_summary, chat_title, summarization_occurred All downstream nodes, Message Persist
Summarization → LLM Outbound Structured output via llm_adapter.get_structured_output() DeepSeek v32 (agent_type: conversation_summarization)
Title Generation → LLM Outbound Via chat_service.generate_title_from_summary() Llama 4 Maverick (agent_type: title_generation)

Known Trade-offs and Debt

Item Impact Remediation
Token estimation heuristic Uses ~4 chars/token, inaccurate for non-Latin scripts or code-heavy conversations Use a proper tokenizer (tiktoken or model-specific) for accurate counts
No summary quality validation No check that the generated summary actually captures key information Add a summary quality score or user feedback mechanism