Skip to content

UI Response System — Overview

This content was migrated from Documentation/UI_NODE_SYSTEM.md and restructured into audience sections. Review for accuracy against the current codebase.

What This Module Does

The UI Response System is the last step in every Swisper conversation — the part that actually writes the message the user sees. After the Global Supervisor has classified the user's intent, retrieved relevant memories, and (when needed) run domain agents, the UI Response System takes all of that work and turns it into a natural, personalized reply.

It adapts its output based on context: a simple "what's the weather?" gets a direct answer, while a complex request involving multiple agents gets a synthesized summary that ties everything together. It also handles voice conversations, reformatting responses so they sound natural when spoken aloud instead of read on screen.

The system uses a fragment-based prompt architecture — the instructions for the AI are stored as editable markdown files rather than being hard-coded in the application. This means the way Swisper communicates (its tone, personality, and rules) can be updated without touching the application code.

Who It Serves

Persona Need
End Users Natural, coherent responses that feel personal and address their actual request — whether typed or spoken
Product Owners Control over how Swisper communicates — its personality, tone, and response style — through editable prompt files
Content Designers Ability to tune Swisper's voice and behavior by editing markdown prompt templates rather than writing code

Key Capabilities

  • Response variant routing — Automatically selects the right response style: simple (direct Q&A), complex (multi-agent synthesis), HITL (clarification questions), or disambiguation (entity follow-ups)
  • Personalization through facts — Weaves relevant personal facts (name, preferences, dietary restrictions, travel plans) naturally into responses rather than listing them robotically
  • Progressive streaming — Sends response text to the user word-by-word as it's generated, so the user sees the first words within about 200ms instead of waiting for the full response
  • Voice optimization — Strips markdown formatting, emojis, and bullet points from responses destined for text-to-speech, replacing them with natural spoken transitions
  • Smart greeting control — Shows personalized greetings at appropriate intervals (not too often, not too rarely) based on a 4-hour threshold
  • Content authority chain — Ensures the AI prioritizes agent results over conversation context over its own knowledge, preventing hallucination about things agents actually looked up

How It Fits in the Platform

The UI Response System sits at the end of the Global Supervisor pipeline:

  • Upstream: Receives processed state from the Global Supervisor — including intent classification, memory context, and agent execution results
  • Downstream: Streams response chunks to the frontend via Server-Sent Events (SupervisorResponseChunkEvent), which renders them in the chat UI or sends them to the voice system for text-to-speech
  • Prompt assets: Reads prompt fragment files (core.md, simple.md, complex.md, voice.md) that define Swisper's personality and response rules
  • Persistence: Hands off the final response to the Message Persist node, which saves it to the database

Limits and Edge Cases

  • Prompt-dependent quality — Response quality is directly tied to the quality of the prompt markdown files. Poorly written prompts lead to poor responses regardless of how good the upstream processing was
  • Card replacement latency — In complex responses, the system buffers output to detect and replace card placeholders (structured data from agents). This adds a small delay to streaming for complex responses
  • Voice mode limitations — Voice optimization strips all formatting, which means structured data (tables, lists, code) loses its visual organization when spoken. The system does its best with natural language transitions, but some content is inherently better suited to text

FAQ

Q: How does Swisper decide between a "simple" and "complex" response? A: The intent classification step (upstream in the Global Supervisor) determines the route. If no domain agents were needed, the simple text node generates a direct response. If agents ran and produced results, the complex text node synthesizes those results into a unified answer.

Q: Can we change Swisper's personality or tone without deploying code? A: Yes. The prompt files (core.md, simple.md, complex.md, voice.md) are markdown assets that define how Swisper communicates. Editing these files changes Swisper's behavior — no Python code changes needed. The core prompt file controls identity, personality, and fundamental rules.

Q: Why does Swisper sometimes say "by the way, which Sophie did you mean?" A: This is the disambiguation response variant. When the system detects an ambiguous entity but it doesn't block the current answer, it provides the full answer first and then casually asks the user to clarify for future reference.

Q: How are greeting messages controlled? A: A smart frequency system prevents greeting fatigue. First-time users always get an intro. Returning users get a greeting if at least 4 hours have passed since their last one. Within that window, greetings are skipped. The system uses chat creation timestamps as a proxy — no dedicated database field is needed.

Q: What happens when Swisper responds in voice mode? A: The same response pipeline runs, but with the "voice" prompt variant. This variant instructs the AI to avoid markdown, emojis, and bullet points. After generation, a voice optimizer post-processes the text to ensure it sounds natural when spoken by the text-to-speech engine.