UI Response System — Architecture¶

This content was migrated from Documentation/UI_NODE_SYSTEM.md and restructured into audience sections. Review for accuracy against the current codebase.

Context and Purpose¶

The UI Response System was refactored from a single monolithic user_interface_node into a set of specialized response nodes. The original node handled all response variants (simple, complex, voice, HITL, disambiguation) in one function, making it difficult to maintain and test.

The driving requirements behind the current architecture are:

Separation of concerns — Each response variant has its own node with a focused responsibility, making changes to one variant independent of others
Fragment-based prompts — Prompt content lives in editable markdown files, not Python strings, allowing non-developers to tune Swisper's voice
Streaming-first — Responses stream word-by-word to minimize perceived latency, with card placeholder replacement handled during the stream
Content authority — An explicit authority chain prevents the LLM from hallucinating when agent results are available

Architecture Overview¶

The UI Response System consists of six specialized nodes, a shared context extractor, a prompt assembly pipeline, and a streaming layer.

flowchart TB
    subgraph Input["From Global Supervisor"]
        STATE[GlobalSupervisorState]
    end

    subgraph Router["UI Router (routing.py)"]
        STATE --> UR{Response Type?}
    end

    subgraph Nodes["Specialized Response Nodes"]
        UR -->|simple chat| ST[Simple Text Node]
        UR -->|complex chat| CT[Complex Text Node]
        UR -->|HITL question| HT[HITL Text Node]
        UR -->|non-blocking disambig| DST[Disambiguation Simple]
        UR -->|blocking disambig| DCT[Disambiguation Complex]
        UR -->|BTW resolved| DA[Disambiguation Ack]
    end

    subgraph Shared["Shared Infrastructure"]
        SC[Shared Context Extractor]
        PA[Prompt Assembly]
        RS[Response Streaming]
    end

    ST --> SC
    CT --> SC
    DST --> SC
    DCT --> SC
    SC --> PA
    PA --> LLM[LLM Call]
    LLM --> RS

    HT -->|no LLM| RS
    DA -->|no LLM| RS

    RS --> EVT[SupervisorResponseChunkEvent]
    EVT --> FE[Frontend / Voice]

Flow summary: The UI Router selects a specialized node based on conversation context. Most nodes extract shared context, assemble a prompt from markdown fragments, call the LLM with streaming, and publish response chunks via the event bus. HITL and Acknowledgment nodes bypass the LLM entirely.

Component Responsibilities¶

Component	Responsibility
Simple Text Node	Direct conversational responses for queries that don't involve domain agents. Uses `simple.md` prompt variant
Complex Text Node	Synthesizes results from domain agents into a coherent response. Handles card placeholder replacement during streaming. Uses `complex.md` prompt variant
HITL Text Node	Formats pre-determined clarification questions from agents. Bypasses the LLM — streams the question directly
Disambiguation Simple Text	Answers the user's question first, then appends a casual "by the way" disambiguation follow-up. Uses `simple_btw.md` prompt variant
Disambiguation Complex Text	Task-oriented disambiguation for complex requests where the ambiguous entity affects the result. Uses `disambiguation_complex.md` prompt
Disambiguation Acknowledgment	Brief static acknowledgment after the user answers a "by the way" disambiguation question. No LLM call
Shared Context Extractor	Extracts common context from state (facts, conversation history, presentation rules, modality) into a `UIContext` dataclass used by all LLM-calling nodes
Prompt Assembly	Loads markdown fragment files, combines core + variant fragments, and injects placeholders (facts, agent results, time, locale, etc.)
Response Streaming	Publishes `SupervisorResponseChunkEvent` messages to the event bus during LLM streaming; publishes `SupervisorResponseCompleteEvent` at the end

Data Model¶

Content Authority Chain¶

The system uses a three-level authority hierarchy to determine what the LLM should prioritize:

Level	Condition	Guidance to LLM
1 — Agent Results	Agent responses exist and are non-empty	"Synthesize faithfully" — agent data is the primary source of truth
2 — Conversation Context	No agent results, but conversation context exists	"Use as background" — prior conversation informs the response
3 — Clarification	No agent results and no context	"Ask one clarifying question" — don't guess, ask

Prompt Fragment System¶

Fragment	Purpose	Loaded When
`core.md`	Identity, personality, anti-fabrication rules, next-step suggestions, language detection	Always (every response)
`simple.md`	Task instructions for direct Q&A	Simple chat route
`complex.md`	Agent synthesis guidance, card formatting rules	Complex chat route
`voice.md`	TTS optimization rules (no markdown, no emojis, natural transitions)	Voice modality
`simple_btw.md`	Answer + casual disambiguation follow-up	Non-blocking disambiguation
`disambiguation_complex.md`	Task-oriented disambiguation	Blocking disambiguation

Prompt Placeholders¶

Placeholder	Example Value	Injected By
`{{CURRENT_TIME}}`	`"2026-02-16T14:30:00Z"`	Prompt assembly
`{{USER_TIMEZONE}}`	`"Europe/Zurich"`	Prompt assembly
`{{USER_LOCALE}}`	`"de-CH"`	Prompt assembly
`{{PRESENTATION_POLICY}}`	`"Verbosity: concise. Tone: friendly."`	User preferences
`{{FACTS_BLOCK}}`	Formatted personalization facts	Shared context extractor
`{{CONTEXT_SUMMARY}}`	Conversation history summary	Shared context extractor
`{{AGENT_TEXT_SUMMARY}}`	Flattened agent results	Complex text node

Key Design Decisions¶

1. Specialized Nodes Over Monolithic UI Node¶

Chosen: Six separate node files, each handling one response variant
Rejected: Single user_interface_node with conditional branching
Rationale: The original monolithic node grew to handle simple, complex, voice, HITL, and disambiguation variants with deeply nested conditionals. Splitting into focused nodes makes each variant independently testable and modifiable. The old user_interface.py still exists but is deprecated

2. Fragment-Based Prompts Over Hard-Coded Strings¶

Chosen: Prompt content stored in .md files, assembled at runtime
Rejected: Python string templates, Jinja2 templates
Rationale: Non-technical stakeholders (product owners, content designers) can review and edit prompt files directly. Version control shows exact prompt text changes. Markdown is more readable than Python strings for long-form content

3. Streaming With Card Buffering¶

Chosen: Buffer a small window (30–150 chars) during streaming to detect and replace card placeholders inline
Rejected: Post-processing the full response after generation; sending cards as separate events
Rationale: Users expect immediate streaming. Post-processing would add seconds of perceived latency. The small buffer window is a compromise — most text streams instantly while card placeholders get replaced before the user sees them

Interfaces and Contracts¶

Interface	Direction	Format	Consumer
Global Supervisor → UI Router	Inbound	Routing function selects node based on state	Global Supervisor graph edges
UI Nodes → LLM Adapter	Outbound	`llm_adapter.stream_message_from_LLM()`	LLM provider (streaming)
UI Nodes → Event Bus	Outbound	`SupervisorResponseChunkEvent` (per chunk), `SupervisorResponseCompleteEvent` (final)	Frontend SSE consumer
UI Nodes → Prompt Files	Inbound	Reads `.md` files from `nodes/ui_helpers/prompts/`	Prompt assembly
UI Nodes → Message Persist	Outbound	Returns `user_interface_response` in state for persistence	Message Persist node

Known Trade-offs and Debt¶

Item	Impact	Remediation
Deprecated `user_interface.py`	The old monolithic node still exists in the codebase. It is unused in normal flows but may cause confusion for new developers	Remove once all flows are confirmed working through the split nodes
Greeting frequency uses proxy	Uses `chats.created_at` as a proxy for "last greeting time" instead of a dedicated field. Multiple quick chats may all trigger greetings	Add `avatars.last_greeting_at` field if users report greeting fatigue
Card replacement regex complexity	The streaming card replacement uses a complex regex with fuzzy matching for LLM typos. This is fragile and hard to debug	Consider a structured card protocol (JSON tags) instead of regex-based replacement
Missing node-level test files	Test files referenced in docstrings (`test_simple_text_behavior.py`, etc.) do not exist in the test directory	Create dedicated tests for each specialized node