Skip to content

UI Response System — Architecture

This content was migrated from Documentation/UI_NODE_SYSTEM.md and restructured into audience sections. Review for accuracy against the current codebase.

Context and Purpose

The UI Response System was refactored from a single monolithic user_interface_node into a set of specialized response nodes. The original node handled all response variants (simple, complex, voice, HITL, disambiguation) in one function, making it difficult to maintain and test.

The driving requirements behind the current architecture are:

  • Separation of concerns — Each response variant has its own node with a focused responsibility, making changes to one variant independent of others
  • Fragment-based prompts — Prompt content lives in editable markdown files, not Python strings, allowing non-developers to tune Swisper's voice
  • Streaming-first — Responses stream word-by-word to minimize perceived latency, with card placeholder replacement handled during the stream
  • Content authority — An explicit authority chain prevents the LLM from hallucinating when agent results are available

Architecture Overview

The UI Response System consists of six specialized nodes, a shared context extractor, a prompt assembly pipeline, and a streaming layer.

flowchart TB
    subgraph Input["From Global Supervisor"]
        STATE[GlobalSupervisorState]
    end

    subgraph Router["UI Router (routing.py)"]
        STATE --> UR{Response Type?}
    end

    subgraph Nodes["Specialized Response Nodes"]
        UR -->|simple chat| ST[Simple Text Node]
        UR -->|complex chat| CT[Complex Text Node]
        UR -->|HITL question| HT[HITL Text Node]
        UR -->|non-blocking disambig| DST[Disambiguation Simple]
        UR -->|blocking disambig| DCT[Disambiguation Complex]
        UR -->|BTW resolved| DA[Disambiguation Ack]
    end

    subgraph Shared["Shared Infrastructure"]
        SC[Shared Context Extractor]
        PA[Prompt Assembly]
        RS[Response Streaming]
    end

    ST --> SC
    CT --> SC
    DST --> SC
    DCT --> SC
    SC --> PA
    PA --> LLM[LLM Call]
    LLM --> RS

    HT -->|no LLM| RS
    DA -->|no LLM| RS

    RS --> EVT[SupervisorResponseChunkEvent]
    EVT --> FE[Frontend / Voice]

Flow summary: The UI Router selects a specialized node based on conversation context. Most nodes extract shared context, assemble a prompt from markdown fragments, call the LLM with streaming, and publish response chunks via the event bus. HITL and Acknowledgment nodes bypass the LLM entirely.

Component Responsibilities

Component Responsibility
Simple Text Node Direct conversational responses for queries that don't involve domain agents. Uses simple.md prompt variant
Complex Text Node Synthesizes results from domain agents into a coherent response. Handles card placeholder replacement during streaming. Uses complex.md prompt variant
HITL Text Node Formats pre-determined clarification questions from agents. Bypasses the LLM — streams the question directly
Disambiguation Simple Text Answers the user's question first, then appends a casual "by the way" disambiguation follow-up. Uses simple_btw.md prompt variant
Disambiguation Complex Text Task-oriented disambiguation for complex requests where the ambiguous entity affects the result. Uses disambiguation_complex.md prompt
Disambiguation Acknowledgment Brief static acknowledgment after the user answers a "by the way" disambiguation question. No LLM call
Shared Context Extractor Extracts common context from state (facts, conversation history, presentation rules, modality) into a UIContext dataclass used by all LLM-calling nodes
Prompt Assembly Loads markdown fragment files, combines core + variant fragments, and injects placeholders (facts, agent results, time, locale, etc.)
Response Streaming Publishes SupervisorResponseChunkEvent messages to the event bus during LLM streaming; publishes SupervisorResponseCompleteEvent at the end

Data Model

Content Authority Chain

The system uses a three-level authority hierarchy to determine what the LLM should prioritize:

Level Condition Guidance to LLM
1 — Agent Results Agent responses exist and are non-empty "Synthesize faithfully" — agent data is the primary source of truth
2 — Conversation Context No agent results, but conversation context exists "Use as background" — prior conversation informs the response
3 — Clarification No agent results and no context "Ask one clarifying question" — don't guess, ask

Prompt Fragment System

Fragment Purpose Loaded When
core.md Identity, personality, anti-fabrication rules, next-step suggestions, language detection Always (every response)
simple.md Task instructions for direct Q&A Simple chat route
complex.md Agent synthesis guidance, card formatting rules Complex chat route
voice.md TTS optimization rules (no markdown, no emojis, natural transitions) Voice modality
simple_btw.md Answer + casual disambiguation follow-up Non-blocking disambiguation
disambiguation_complex.md Task-oriented disambiguation Blocking disambiguation

Prompt Placeholders

Placeholder Example Value Injected By
{{CURRENT_TIME}} "2026-02-16T14:30:00Z" Prompt assembly
{{USER_TIMEZONE}} "Europe/Zurich" Prompt assembly
{{USER_LOCALE}} "de-CH" Prompt assembly
{{PRESENTATION_POLICY}} "Verbosity: concise. Tone: friendly." User preferences
{{FACTS_BLOCK}} Formatted personalization facts Shared context extractor
{{CONTEXT_SUMMARY}} Conversation history summary Shared context extractor
{{AGENT_TEXT_SUMMARY}} Flattened agent results Complex text node

Key Design Decisions

1. Specialized Nodes Over Monolithic UI Node

  • Chosen: Six separate node files, each handling one response variant
  • Rejected: Single user_interface_node with conditional branching
  • Rationale: The original monolithic node grew to handle simple, complex, voice, HITL, and disambiguation variants with deeply nested conditionals. Splitting into focused nodes makes each variant independently testable and modifiable. The old user_interface.py still exists but is deprecated

2. Fragment-Based Prompts Over Hard-Coded Strings

  • Chosen: Prompt content stored in .md files, assembled at runtime
  • Rejected: Python string templates, Jinja2 templates
  • Rationale: Non-technical stakeholders (product owners, content designers) can review and edit prompt files directly. Version control shows exact prompt text changes. Markdown is more readable than Python strings for long-form content

3. Streaming With Card Buffering

  • Chosen: Buffer a small window (30–150 chars) during streaming to detect and replace card placeholders inline
  • Rejected: Post-processing the full response after generation; sending cards as separate events
  • Rationale: Users expect immediate streaming. Post-processing would add seconds of perceived latency. The small buffer window is a compromise — most text streams instantly while card placeholders get replaced before the user sees them

Interfaces and Contracts

Interface Direction Format Consumer
Global Supervisor → UI Router Inbound Routing function selects node based on state Global Supervisor graph edges
UI Nodes → LLM Adapter Outbound llm_adapter.stream_message_from_LLM() LLM provider (streaming)
UI Nodes → Event Bus Outbound SupervisorResponseChunkEvent (per chunk), SupervisorResponseCompleteEvent (final) Frontend SSE consumer
UI Nodes → Prompt Files Inbound Reads .md files from nodes/ui_helpers/prompts/ Prompt assembly
UI Nodes → Message Persist Outbound Returns user_interface_response in state for persistence Message Persist node

Known Trade-offs and Debt

Item Impact Remediation
Deprecated user_interface.py The old monolithic node still exists in the codebase. It is unused in normal flows but may cause confusion for new developers Remove once all flows are confirmed working through the split nodes
Greeting frequency uses proxy Uses chats.created_at as a proxy for "last greeting time" instead of a dedicated field. Multiple quick chats may all trigger greetings Add avatars.last_greeting_at field if users report greeting fatigue
Card replacement regex complexity The streaming card replacement uses a complex regex with fuzzy matching for LLM typos. This is fragile and hard to debug Consider a structured card protocol (JSON tags) instead of regex-based replacement
Missing node-level test files Test files referenced in docstrings (test_simple_text_behavior.py, etc.) do not exist in the test directory Create dedicated tests for each specialized node