Global Supervisor — Architecture¶

This content was migrated from Documentation/GLOBAL_SUPERVISOR.md and restructured into audience sections. Review for accuracy against the current codebase.

Context and Purpose¶

The Global Supervisor exists because Swisper needs a single, deterministic orchestration layer that coordinates every aspect of a conversation turn — from intent classification through memory retrieval, agent delegation, and response generation.

The key driving requirements behind its design are:

Deterministic flow control — Each user message must follow a predictable path through well-defined processing stages, making debugging and auditing possible
Entity-first disambiguation — Facts must never be stored against the wrong entity. Entity resolution must complete before fact extraction proceeds
Minimal time-to-first-token — Simple queries should skip expensive processing stages via optimization flags, while complex queries get full pipeline treatment
State persistence for HITL — When the system needs to ask the user a clarification question (e.g., disambiguation), the entire conversation state must be checkpointed so execution can resume exactly where it left off

Architecture Overview¶

The Global Supervisor is implemented as a LangGraph StateGraph — a directed graph where each node is a processing step and edges define the flow between them. All nodes share a single GlobalSupervisorState (TypedDict) that accumulates data as the message progresses through the pipeline.

The graph is organized into seven logical stages:

flowchart TB
    subgraph Entry["1. Entry & Session"]
        START([User Message]) --> SI[Session Init]
        SI --> SC[Summarization Check]
        SC -->|needs summarization| SUM[Summarization]
        SC -->|no| CL
        SUM --> CL[Context Loader]
    end

    subgraph HITL["2. HITL Resume"]
        CL --> UIL[HITL Handler]
        UIL -->|has interrupt| RESUME{Resume Type}
        UIL -->|no interrupt| IC
        RESUME -->|disambiguation| IC
        RESUME -->|agent HITL| AE
    end

    subgraph Classification["3. Intent Classification"]
        IC[Intent Classification]
    end

    subgraph Memory["4. Memory Pipeline"]
        IC --> RR{Retrieval Router}
        RR -->|skip retrieval| MA
        RR -->|semantic only| ER
        RR -->|parallel retrieval| ER
        ER[Entity Resolution] --> SR[Semantic Retrieval]
        SR --> TR[Temporal Retrieval]
        TR --> MA[Memory Assembly]
        MA --> FE[Fact Extraction]
        FE --> EM[Extraction Merge]
    end

    subgraph Disambiguation["5. Disambiguation"]
        EM --> AMB{Entity Ambiguity?}
        AMB -->|blocking| DB[Disambiguation Blocking]
        AMB -->|non-blocking| DS[Disambiguation Simple]
        AMB -->|none| ROUTE
        DB --> MP
        DS --> MP
    end

    subgraph Planning["6. Planning & Execution"]
        ROUTE{Route}
        ROUTE -->|simple chat| UIR[UI Router]
        ROUTE -->|complex| GP[Global Planner]
        GP --> AE[Agent Execution]
        AE --> GP2{More agents?}
        GP2 -->|yes| GP
        GP2 -->|no| UIR
    end

    subgraph Response["7. Response Generation"]
        UIR --> UINODES{Response Type}
        UINODES -->|simple| ST[Simple Text]
        UINODES -->|complex| CT[Complex Text]
        UINODES -->|hitl| HT[HITL Text]
        ST --> MP[Message Persist]
        CT --> MP
        HT --> MP
    end

    MP --> DONE([End / HITL Interrupt])

Flow summary: A message enters at Session Init, gets classified, passes through memory retrieval and entity resolution, optionally triggers disambiguation, gets routed to either direct response or planner-driven agent execution, and exits through a specialized UI response node.

Component Responsibilities¶

Component	Responsibility
Session Init	Loads chat history, initializes token tracking, sets up turn context
Summarization Check / Summarization	Detects when conversation history exceeds thresholds (>20 messages or >4,000 tokens) and compresses it
Context Loader	Loads avatar configuration, presentation rules, and preloaded facts (parallel execution)
HITL Handler	Detects pending human-in-the-loop interrupts and routes to the correct resume point
Intent Classification	Classifies intent (simple/complex/greeting), extracts entities, and sets optimization flags (`has_extractable_facts`, `has_preferences`, `needs_semantic_retrieval`)
Retrieval Router	Decides which memory retrieval path to take based on optimization flags
Entity Resolution	Resolves mentioned entities against the user's contact database; detects ambiguity
Semantic Retrieval	Vector search for relevant facts based on message content
Temporal Retrieval	Time-based fact retrieval (e.g., upcoming events, recent changes)
Memory Assembly	Merges all retrieved facts into a unified context
Fact Extraction	Extracts new facts from the user's message (runs in parallel with entity resolution)
Extraction Merge	Links extracted facts to resolved entities and persists them to the database
Disambiguation (Blocking)	Pauses execution and asks the user which entity they mean — generates a HITL interrupt
Disambiguation (Simple)	Non-blocking disambiguation: answers the question and appends a "by the way, which X?" follow-up
Global Planner	Creates a multi-step execution plan determining which domain agents to invoke and in what order
Agent Execution	Executes domain agents (Productivity, Wealth, Research, etc.) and collects results
UI Response Nodes	Specialized response generators: Simple Text, Complex Text (with agent result synthesis), HITL Text
Message Persist	Saves the assistant's response and conversation metadata to the database

Data Model¶

GlobalSupervisorState¶

The shared state is a Python TypedDict with these key domains:

Domain	Key Fields	Purpose
Conversation	`user_message`, `messages_history`, `chat_id`, `user_id`, `avatar_id`, `model`	Core conversation identity and history
Intent	`intent_classification` (route, entities, optimization flags)	Routing decision from classification
Memory	`memory_domain` (conversation context, facts), `resolved_entities`, `extracted_facts`, `pending_facts`	Retrieved and extracted knowledge
Disambiguation	`entity_ambiguity`, `btw_disambiguation_resolved`, `blocking_disambiguation_resolved`	Entity ambiguity tracking
Planning	`global_planner_decision`, `current_agent_result`, `recent_agent_results`	Execution plan and agent outputs
HITL	`user_in_the_loop` (UserInTheLoop model), `hitl_user_response`	Human-in-the-loop interrupt state
Response	`user_interface_response`, `presentation_rules`, `modality`	Generated response and display rules

State Persistence¶

Store	Role	Mechanism
Redis	Primary state checkpointer	Snapshots after each node via LangGraph checkpointer
PostgreSQL	Long-term fallback	Recovery when Redis state is evicted

The checkpointer uses chat_id as the thread identifier, enabling resume-from-checkpoint for HITL interrupts and crash recovery.

Key Design Decisions¶

1. Synchronous Entity Resolution Before Fact Storage¶

Chosen: Entity resolution runs synchronously and blocks fact extraction from persisting until entities are resolved
Rejected: Parallel entity resolution + fact extraction with post-hoc linking
Rationale: When a user says "Thomas is traveling to Mallorca" and there are two contacts named Thomas, storing the fact before knowing which Thomas it belongs to leads to orphaned or misattributed data. The 500–1,500ms latency cost is acceptable to guarantee data correctness

2. LangGraph StateGraph Over Custom Orchestration¶

Chosen: LangGraph StateGraph with conditional edges and built-in checkpointing
Rejected: Custom async pipeline, event-driven choreography
Rationale: LangGraph provides deterministic execution, built-in state persistence (checkpointers), native HITL support via interrupt(), and visual debugging. The structured graph makes it straightforward to add, remove, or reorder nodes

3. Optimization Flags for Node Skipping¶

Chosen: Intent classification sets boolean flags (has_extractable_facts, needs_semantic_retrieval, has_preferences) that downstream routing functions use to skip unnecessary nodes
Rejected: Running all nodes for every message; lazy evaluation at each node
Rationale: A greeting like "Hi" doesn't need fact extraction (~9,000 tokens) or semantic retrieval (~500ms). The flags save 2–3 seconds and ~15,000 tokens on simple messages while keeping the graph structure uniform

Interfaces and Contracts¶

Interface	Direction	Format	Consumer
Orchestration Service → GlobalSupervisor	Inbound	`GlobalSupervisor.run(user_message, model, chat_id, user_id, ...)`	Orchestration Service (API layer)
GlobalSupervisor → Domain Agents	Outbound	`DomainAgentInterface.execute(state)` via Agent Registry	Productivity, Wealth, Research, and other domain agents
GlobalSupervisor → Memory System	Bidirectional	Repository pattern (read facts, write extracted facts)	Fact storage, entity resolution, semantic search
GlobalSupervisor → Redis/PostgreSQL	Outbound	LangGraph checkpointer protocol	State persistence for HITL and recovery
GlobalSupervisor → UI (streaming)	Outbound	`SupervisorResponseChunkEvent` (Server-Sent Events)	Frontend text/voice rendering

Known Trade-offs and Debt¶

Item	Impact	Remediation
Large `build_graph()` method	The graph construction in `agent.py` is ~1,000 lines. Adding or reordering nodes requires reading the full method to understand edge dependencies	Potential refactor: extract stage-level subgraphs (entry, memory, planning, response) into separate builder functions
Legacy `memory.py` node	The old monolithic memory node still exists and is used only for the greeting flow. All other flows use the split memory pipeline (retrieval_router → semantic → temporal → assembly)	Remove legacy node once greeting flow is migrated to the split pipeline
State field sprawl	`GlobalSupervisorState` has grown to ~77 fields across multiple concerns, making it hard to know which fields are relevant at each node	Consider namespaced sub-states or a state schema validation layer
No retry on agent execution failures	If a domain agent raises an exception, the supervisor catches it and reports the failure but does not retry	Add configurable retry with exponential backoff for transient failures