UI Response System — Architecture¶
This content was migrated from
Documentation/UI_NODE_SYSTEM.mdand restructured into audience sections. Review for accuracy against the current codebase.
Context and Purpose¶
The UI Response System was refactored from a single monolithic user_interface_node into a set of specialized response nodes. The original node handled all response variants (simple, complex, voice, HITL, disambiguation) in one function, making it difficult to maintain and test.
The driving requirements behind the current architecture are:
- Separation of concerns — Each response variant has its own node with a focused responsibility, making changes to one variant independent of others
- Fragment-based prompts — Prompt content lives in editable markdown files, not Python strings, allowing non-developers to tune Swisper's voice
- Streaming-first — Responses stream word-by-word to minimize perceived latency, with card placeholder replacement handled during the stream
- Content authority — An explicit authority chain prevents the LLM from hallucinating when agent results are available
Architecture Overview¶
The UI Response System consists of six specialized nodes, a shared context extractor, a prompt assembly pipeline, and a streaming layer.
flowchart TB
subgraph Input["From Global Supervisor"]
STATE[GlobalSupervisorState]
end
subgraph Router["UI Router (routing.py)"]
STATE --> UR{Response Type?}
end
subgraph Nodes["Specialized Response Nodes"]
UR -->|simple chat| ST[Simple Text Node]
UR -->|complex chat| CT[Complex Text Node]
UR -->|HITL question| HT[HITL Text Node]
UR -->|non-blocking disambig| DST[Disambiguation Simple]
UR -->|blocking disambig| DCT[Disambiguation Complex]
UR -->|BTW resolved| DA[Disambiguation Ack]
end
subgraph Shared["Shared Infrastructure"]
SC[Shared Context Extractor]
PA[Prompt Assembly]
RS[Response Streaming]
end
ST --> SC
CT --> SC
DST --> SC
DCT --> SC
SC --> PA
PA --> LLM[LLM Call]
LLM --> RS
HT -->|no LLM| RS
DA -->|no LLM| RS
RS --> EVT[SupervisorResponseChunkEvent]
EVT --> FE[Frontend / Voice]
Flow summary: The UI Router selects a specialized node based on conversation context. Most nodes extract shared context, assemble a prompt from markdown fragments, call the LLM with streaming, and publish response chunks via the event bus. HITL and Acknowledgment nodes bypass the LLM entirely.
Component Responsibilities¶
| Component | Responsibility |
|---|---|
| Simple Text Node | Direct conversational responses for queries that don't involve domain agents. Uses simple.md prompt variant |
| Complex Text Node | Synthesizes results from domain agents into a coherent response. Handles card placeholder replacement during streaming. Uses complex.md prompt variant |
| HITL Text Node | Formats pre-determined clarification questions from agents. Bypasses the LLM — streams the question directly |
| Disambiguation Simple Text | Answers the user's question first, then appends a casual "by the way" disambiguation follow-up. Uses simple_btw.md prompt variant |
| Disambiguation Complex Text | Task-oriented disambiguation for complex requests where the ambiguous entity affects the result. Uses disambiguation_complex.md prompt |
| Disambiguation Acknowledgment | Brief static acknowledgment after the user answers a "by the way" disambiguation question. No LLM call |
| Shared Context Extractor | Extracts common context from state (facts, conversation history, presentation rules, modality) into a UIContext dataclass used by all LLM-calling nodes |
| Prompt Assembly | Loads markdown fragment files, combines core + variant fragments, and injects placeholders (facts, agent results, time, locale, etc.) |
| Response Streaming | Publishes SupervisorResponseChunkEvent messages to the event bus during LLM streaming; publishes SupervisorResponseCompleteEvent at the end |
Data Model¶
Content Authority Chain¶
The system uses a three-level authority hierarchy to determine what the LLM should prioritize:
| Level | Condition | Guidance to LLM |
|---|---|---|
| 1 — Agent Results | Agent responses exist and are non-empty | "Synthesize faithfully" — agent data is the primary source of truth |
| 2 — Conversation Context | No agent results, but conversation context exists | "Use as background" — prior conversation informs the response |
| 3 — Clarification | No agent results and no context | "Ask one clarifying question" — don't guess, ask |
Prompt Fragment System¶
| Fragment | Purpose | Loaded When |
|---|---|---|
core.md |
Identity, personality, anti-fabrication rules, next-step suggestions, language detection | Always (every response) |
simple.md |
Task instructions for direct Q&A | Simple chat route |
complex.md |
Agent synthesis guidance, card formatting rules | Complex chat route |
voice.md |
TTS optimization rules (no markdown, no emojis, natural transitions) | Voice modality |
simple_btw.md |
Answer + casual disambiguation follow-up | Non-blocking disambiguation |
disambiguation_complex.md |
Task-oriented disambiguation | Blocking disambiguation |
Prompt Placeholders¶
| Placeholder | Example Value | Injected By |
|---|---|---|
{{CURRENT_TIME}} |
"2026-02-16T14:30:00Z" |
Prompt assembly |
{{USER_TIMEZONE}} |
"Europe/Zurich" |
Prompt assembly |
{{USER_LOCALE}} |
"de-CH" |
Prompt assembly |
{{PRESENTATION_POLICY}} |
"Verbosity: concise. Tone: friendly." |
User preferences |
{{FACTS_BLOCK}} |
Formatted personalization facts | Shared context extractor |
{{CONTEXT_SUMMARY}} |
Conversation history summary | Shared context extractor |
{{AGENT_TEXT_SUMMARY}} |
Flattened agent results | Complex text node |
Key Design Decisions¶
1. Specialized Nodes Over Monolithic UI Node¶
- Chosen: Six separate node files, each handling one response variant
- Rejected: Single
user_interface_nodewith conditional branching - Rationale: The original monolithic node grew to handle simple, complex, voice, HITL, and disambiguation variants with deeply nested conditionals. Splitting into focused nodes makes each variant independently testable and modifiable. The old
user_interface.pystill exists but is deprecated
2. Fragment-Based Prompts Over Hard-Coded Strings¶
- Chosen: Prompt content stored in
.mdfiles, assembled at runtime - Rejected: Python string templates, Jinja2 templates
- Rationale: Non-technical stakeholders (product owners, content designers) can review and edit prompt files directly. Version control shows exact prompt text changes. Markdown is more readable than Python strings for long-form content
3. Streaming With Card Buffering¶
- Chosen: Buffer a small window (30–150 chars) during streaming to detect and replace card placeholders inline
- Rejected: Post-processing the full response after generation; sending cards as separate events
- Rationale: Users expect immediate streaming. Post-processing would add seconds of perceived latency. The small buffer window is a compromise — most text streams instantly while card placeholders get replaced before the user sees them
Interfaces and Contracts¶
| Interface | Direction | Format | Consumer |
|---|---|---|---|
| Global Supervisor → UI Router | Inbound | Routing function selects node based on state | Global Supervisor graph edges |
| UI Nodes → LLM Adapter | Outbound | llm_adapter.stream_message_from_LLM() |
LLM provider (streaming) |
| UI Nodes → Event Bus | Outbound | SupervisorResponseChunkEvent (per chunk), SupervisorResponseCompleteEvent (final) |
Frontend SSE consumer |
| UI Nodes → Prompt Files | Inbound | Reads .md files from nodes/ui_helpers/prompts/ |
Prompt assembly |
| UI Nodes → Message Persist | Outbound | Returns user_interface_response in state for persistence |
Message Persist node |
Known Trade-offs and Debt¶
| Item | Impact | Remediation |
|---|---|---|
Deprecated user_interface.py |
The old monolithic node still exists in the codebase. It is unused in normal flows but may cause confusion for new developers | Remove once all flows are confirmed working through the split nodes |
| Greeting frequency uses proxy | Uses chats.created_at as a proxy for "last greeting time" instead of a dedicated field. Multiple quick chats may all trigger greetings |
Add avatars.last_greeting_at field if users report greeting fatigue |
| Card replacement regex complexity | The streaming card replacement uses a complex regex with fuzzy matching for LLM typos. This is fragile and hard to debug | Consider a structured card protocol (JSON tags) instead of regex-based replacement |
| Missing node-level test files | Test files referenced in docstrings (test_simple_text_behavior.py, etc.) do not exist in the test directory |
Create dedicated tests for each specialized node |