Skip to content

HITL System — Overview

This content was migrated from Documentation/SWISPER_HUMAN_IN_THE_LOOP_ARCHITECTURE.md and restructured into audience sections. Review for accuracy against the current codebase.

What This Module Does

The HITL (Human-in-the-Loop) System is Swisper's mechanism for pausing a conversation to ask the user a question before continuing. When Swisper encounters a situation where it needs more information, confirmation, or a choice from the user, the HITL system handles the entire pause-ask-resume cycle.

This happens in three common scenarios:

  • Clarification — An agent needs missing information. For example, "I found 3 emails from Sarah — which one should I reply to?"
  • Confirmation — An agent is about to perform a risky or irreversible action. For example, "Confirm sending this email to an external domain?"
  • Disambiguation — The system found multiple matching contacts. For example, "Which Thomas did you mean — Thomas Weber (colleague) or Thomas Schmidt (friend)?"

The key design principle is that domain agents never talk directly to users. Instead, they signal that they need input, and the HITL system handles formatting the question, pausing execution, waiting for the answer, and resuming exactly where things left off — even if the user takes hours to respond or the server restarts in between.

Who It Serves

Persona Need
End Users Clear, well-formatted questions when Swisper needs their input, with the ability to respond at their own pace without losing context
Product Owners Confidence that risky actions (sending emails, making changes) always go through user approval when business rules require it
Support Staff Understanding of why Swisper paused a conversation and how the resume flow works

Key Capabilities

  • Centralized control — All user interactions go through a single handler, ensuring consistent formatting and behavior regardless of which agent triggered the question
  • State persistence — The entire conversation state is checkpointed to Redis before pausing. The system can resume hours later or after a server restart
  • Multi-turn clarification — Supports multiple rounds of questions in a single task (e.g., "who?" then "what subject?" then "confirm?")
  • Blocking vs non-blocking disambiguation — Critical ambiguity blocks and asks first; incidental ambiguity answers first and asks "by the way" as a follow-up
  • Escape handling — If the user changes the subject instead of answering, the system gracefully abandons the current task and processes the new intent
  • Channel-agnostic — Works identically for text and voice conversations

How It Fits in the Platform

  • Upstream (agent-triggered): Domain agents return WAITING_FOR_INPUT status with a UserInTheLoop payload when they need user input
  • Upstream (disambiguation-triggered): Entity resolution detects ambiguous contacts and triggers disambiguation via the HITL flow
  • Orchestration: The HITL Handler calls LangGraph's interrupt() to pause the graph and checkpoint state to Redis
  • Resume: User responds → Command(resume=user_message) passes the answer back to the interrupted node
  • Downstream: The graph continues from where it paused — the agent receives the answer and completes its task

Limits and Edge Cases

  • No proactive HITL — The system only asks questions when explicitly triggered by an agent or disambiguation
  • Single active interrupt — Only one HITL question can be active at a time per conversation
  • Redis dependency for resume — If the Redis checkpoint is lost before the user responds, the conversation cannot resume from the interrupt

FAQ

Q: What happens if I don't answer a HITL question for a long time? A: The system waits indefinitely. Your conversation state is saved, so you can come back hours or days later and execution resumes exactly where it paused.

Q: Can I change my mind instead of answering the question? A: Yes. If you type something unrelated, the system detects this as an "escape" and abandons the current task. Your new message is processed as a fresh intent.

Q: Why does Swisper sometimes ask "which Thomas?" before answering, and sometimes after? A: It depends on whether knowing which Thomas is critical to the answer. If you asked "when is Thomas's birthday?" — the system blocks and asks first. If you asked "what is MCP?" and also mentioned Thomas — the system answers first and asks "by the way, which Thomas?" as a follow-up.

Q: Are HITL interactions tracked? A: Yes. The UserInTheLoop state (question, answer, stored context) is preserved in the conversation checkpoint. The hitl_user_response field stores the answer for audit purposes.