Skip to content

HITL System — Architecture

This content was migrated from Documentation/SWISPER_HUMAN_IN_THE_LOOP_ARCHITECTURE.md and restructured into audience sections. Review for accuracy against the current codebase.

Context and Purpose

The HITL System exists because AI agents sometimes need human input to proceed safely and correctly. Without it, agents would either guess (risking errors) or fail silently (losing user trust).

The driving requirements are:

  • Agents never talk directly to users — All user-facing questions go through a centralized handler for consistent UX
  • Indefinite pause/resume — The system must survive server restarts and arbitrarily long user pauses via LangGraph checkpoint persistence
  • Multi-turn support — Complex tasks may require multiple clarification rounds before completion
  • Blocking vs non-blocking — Entity disambiguation must distinguish between cases that block the answer and cases that are incidental

Architecture Overview

flowchart TB
    subgraph Triggers["HITL Triggers"]
        DA[Domain Agent] -->|WAITING_FOR_INPUT| AE[Agent Execution Node]
        ER[Entity Resolution] -->|ambiguity detected| DB[Disambiguation Blocking]
    end

    subgraph Orchestration["HITL Orchestration"]
        AE -->|user_in_the_loop.is_waiting| UIR[UI Router]
        DB -->|streams question| UIR
        UIR --> HT[HITL Text Node]
        HT -->|streams question to frontend| HANDLER[HITL Handler]
        HANDLER -->|interrupt\(\)| PAUSE([Graph Pauses])
        PAUSE -->|checkpoint to Redis| WAIT[Wait for User]
    end

    subgraph Resume["Resume Flow"]
        WAIT -->|user responds| CMD["Command(resume=answer)"]
        CMD --> HANDLER2[HITL Handler]
        HANDLER2 -->|escaping?| NEW[Process New Intent]
        HANDLER2 -->|answer received| ROUTE{Resume Route}
        ROUTE -->|agent HITL| AE2[Agent Execution]
        ROUTE -->|disambiguation| DR[Disambiguation Resolution]
    end

    subgraph Resolution["Disambiguation Resolution"]
        DR -->|resolve entity| FACTS[Persist Pending Facts]
        DR -->|create new entity| CNE[Create New Entity Node]
        FACTS --> CONTINUE[Continue Graph]
        CNE --> CONTINUE
    end

Flow summary: A trigger (agent or disambiguation) creates a UserInTheLoop payload with is_waiting=True. The HITL Handler calls interrupt() to pause the graph and checkpoint state. When the user responds, Command(resume=answer) restores execution. The handler routes to either the original agent or disambiguation resolution based on the interrupt source.

Component Responsibilities

Component Responsibility
HITL Handler Node Central orchestrator — calls interrupt(), processes user response, detects escaping, routes to resume target
Disambiguation Blocking Node Generates ask-only questions for critical entity ambiguity. Streams question, sets is_waiting=True
Disambiguation Resolution Node Resolves user's entity choice — fast-path (exact match) or LLM semantic matching. Persists pending facts with correct entity
Create New Entity Node Handles "Someone else" flow — LLM extracts role/context, creates new Person record
HITL Text Node Formats and streams HITL questions to the frontend (bypasses LLM)
Agent Execution Node Detects WAITING_FOR_INPUT status from domain agents and propagates UserInTheLoop to state

Data Model

UserInTheLoop (Pydantic Model)

Field Type Purpose
question str Current question to display
answer str User's answer (populated on resume)
is_waiting bool True = graph is paused waiting for user
escaping bool True = user wants to abandon current task
target_agent str Which agent to resume (e.g., "productivity_agent")
source_node str Which node triggered the interrupt
needs_clarification bool Type: missing data
needs_confirmation bool Type: risky action approval
last_question_type str "clarification", "confirmation", or "disambiguation"
stored_data dict Context preserved across the interrupt (search results, drafts, entity options)
tool_results dict Tool execution results preserved across the interrupt
previous_questions / previous_answers list[str] Multi-turn history
modality str "text" or "voice"

State Persistence

Store What's Saved Mechanism
Redis Full graph state including UserInTheLoop, agent context, partial results LangGraph checkpoint via interrupt()
PostgreSQL Fallback state for crash recovery StatePersistenceManager

Key Design Decisions

1. LangGraph interrupt() Over Custom Polling

  • Chosen: Native LangGraph interrupt() with Command(resume=...) for pause/resume
  • Rejected: Custom polling loop, WebSocket-based waiting, database flag checking
  • Rationale: interrupt() integrates directly with LangGraph's checkpoint system, providing automatic state persistence, deterministic resume, and multi-turn support without custom infrastructure

2. Centralized Handler Over Per-Agent Questions

  • Chosen: All HITL requests flow through a single handler node that formats and delivers questions
  • Rejected: Each agent formats its own user-facing questions
  • Rationale: Consistent UX across all agents and channels. Single point for audit logging, A/B testing of question formats, and channel-specific adaptation (text vs voice)

3. Blocking vs Non-Blocking Disambiguation

  • Chosen: Two separate flows — blocking (ask first) and non-blocking (answer first, ask "by the way")
  • Rejected: Always blocking; always non-blocking; let the LLM decide
  • Rationale: Blocking every disambiguation adds unnecessary latency for incidental mentions. Non-blocking every time risks incorrect answers when entity identity matters. The entity resolution node sets a relevance field ("blocking" vs "non_blocking") based on whether the entity affects the answer

Interfaces and Contracts

Interface Direction Format Consumer
Domain Agents → HITL Inbound DomainAgentResult(status=WAITING_FOR_INPUT, user_in_the_loop=...) Agent Execution Node
Entity Resolution → HITL Inbound entity_ambiguity dict in state with relevance field Disambiguation Blocking Node
HITL → Frontend Outbound Streamed question via SupervisorResponseChunkEvent Frontend chat UI
Frontend → HITL Inbound User message → Command(resume=user_message) HITL Handler Node
HITL → Disambiguation Resolution Outbound UserInTheLoop.answer + entity_ambiguity context Disambiguation Resolution Node

Known Trade-offs and Debt

Item Impact Remediation
Policy management not yet implemented The legacy doc describes a full PolicyManager service for tool-level policies (email approval, transfer thresholds). This is designed but not built Implement PolicyManager when domain agents need configurable approval rules
Single active interrupt Only one HITL question at a time per conversation. Agents needing multiple inputs must ask sequentially Could batch questions into a single structured form, but adds UI complexity
No HITL analytics No tracking of approval rates, common clarification patterns, or user response times Add audit trail and analytics when compliance reporting is needed