Platform Architecture Overview¶
Swisper is a multi-agent AI assistant built on LangGraph. Its architecture follows four core principles:
- Specialization — Each domain agent masters one capability (research, productivity, documents, wealth).
- Orchestration — A central Global Supervisor coordinates the entire conversation flow.
- Personalization — Persistent memory across conversations enables the assistant to learn preferences and context over time.
- Modularity — New domain agents can be added without changing the orchestrator or other agents.
System Context¶
The following diagram shows how users, the platform, domain agents, and external services interact:
graph TB
subgraph Users ["Users"]
WebUser["Web App\n(React)"]
VoiceUser["Voice\n(Azure Speech)"]
end
subgraph Platform ["Swisper Platform"]
direction TB
API["FastAPI\nAPI Layer"]
subgraph Orchestration ["Core Orchestration"]
GS["Global Supervisor\n(LangGraph StateGraph)"]
IC["Intent Classification"]
GP["Global Planner"]
end
subgraph Memory ["Memory & Knowledge"]
FS["Fact System"]
ED["Entity Disambiguation"]
SUM["Summarization"]
SR["Semantic Retrieval\n(pgvector)"]
end
subgraph Interaction ["User Interaction"]
UI["UI Response System"]
VS["Voice System"]
GR["Greeting System"]
HITL["HITL System"]
end
subgraph DomainAgents ["Domain Agents"]
RA["Research Agent\n(web, weather, news)"]
PA["Productivity Agent\n(email, calendar)"]
DA["Document Agent\n(RAG search)"]
WA["Wealth Agent\n(WealthOS)"]
end
end
subgraph External ["External Services"]
LLM["LLM Providers\n(Gemini, Claude, Azure OpenAI,\nKvant)"]
Azure["Azure Speech\nServices"]
Gmail["Gmail / Outlook"]
MCP["MCP Research\nService"]
WOS["WealthOS API"]
end
subgraph Data ["Data Layer"]
PG["PostgreSQL\n+ pgvector"]
Redis["Redis"]
end
WebUser --> API
VoiceUser --> VS
VS --> API
API --> GS
GS --> IC
GS --> GP
GS --> Memory
GP --> DomainAgents
DomainAgents --> LLM
RA --> MCP
PA --> Gmail
WA --> WOS
VS --> Azure
GS --> UI
UI --> API
GS --> Data
Memory --> Data
DomainAgents --> Data
Conversation Flow¶
Every user interaction follows this path through the Global Supervisor:
flowchart TD
Start["User message arrives"] --> Init["Session Init\n(load history, avatar)"]
Init --> SumCheck{"Conversation\ntoo long?"}
SumCheck -->|Yes| Summarize["Summarize\n(compress context)"]
SumCheck -->|No| Context["Load Context\n(avatar, preferences, facts)"]
Summarize --> Context
Context --> HITL{"Pending HITL\ninterrupt?"}
HITL -->|Yes| HITLHandle["Handle HITL\n(user answered a question)"]
HITL -->|No| Classify["Classify Intent"]
HITLHandle --> Classify
Classify --> Extract["Extract Facts +\nResolve Entities\n(parallel)"]
Extract --> Disambig{"Entity\nambiguous?"}
Disambig -->|Yes| AskUser["Ask User\n(which Thomas?)"]
Disambig -->|No| Retrieve["Semantic +\nTemporal Retrieval"]
AskUser --> Retrieve
Retrieve --> Assemble["Assemble Memory"]
Assemble --> Route{"Simple or\nComplex?"}
Route -->|Simple| SimpleUI["Generate\nDirect Response"]
Route -->|Complex| Plan["Global Planner\n(create execution plan)"]
Plan --> Execute["Execute Domain Agent"]
Execute --> PlanCheck{"Plan\ncomplete?"}
PlanCheck -->|No| Plan
PlanCheck -->|Needs clarification| HITLAsk["Ask User\n(HITL interrupt)"]
PlanCheck -->|Yes| ComplexUI["Assemble\nFinal Response"]
SimpleUI --> Persist["Save Messages\nto Database"]
ComplexUI --> Persist
HITLAsk --> Persist
Persist --> Done["Stream Response\nto User"]
Key Routing Decisions¶
| Decision Point | Logic |
|---|---|
| Summarization check | If conversation history exceeds token threshold, compress before proceeding |
| HITL interrupt | If a previous agent asked the user a question and they've now answered, resume that flow first |
| Intent classification | Determines simple vs. complex. Sets routing flags for entity handling and retrieval strategy. |
| Entity disambiguation | If the extracted entities are ambiguous (multiple matches), pause and ask the user — either inline (non-blocking) or via HITL (blocking) |
| Simple vs. complex routing | Simple queries go directly to the UI response node. Complex queries go to the Global Planner for multi-step execution. |
| Agent execution loop | The planner and agent executor form a loop — the planner can invoke an agent, evaluate the result, and decide to invoke another agent, ask the user for more info, or finalize. |
Domain Agent Architecture¶
All domain agents implement a common interface (DomainAgentInterface) and are registered in the DomainAgentRegistry. This factory pattern allows the Global Planner to select and invoke agents by name.
| Agent | Capability | External Services | Key Tools |
|---|---|---|---|
| Research Agent | Web search, weather, news, finance, places, academic papers, patents, flights | MCP Research Service | Weather lookup, web search, news search, places search |
| Productivity Agent | Email management, calendar operations, contact resolution, daily briefings | Gmail API, Microsoft Graph (Office 365) | Send email, read inbox, create calendar event, list contacts |
| Document Agent | Semantic search and analysis of uploaded documents (RAG) | — (local) | Semantic search, document summary |
| Wealth Agent | Client lookup, portfolio analysis, holdings, transactions | WealthOS API | Client search, portfolio overview, holdings detail |
Each agent is itself a LangGraph StateGraph with its own planning, execution, and completion evaluation nodes. The pattern is:
The Productivity Agent additionally supports multi-provider routing — the same email operations work against both Gmail and Office 365, selected per user account via a provider factory.
LLM Adapter Factory¶
Swisper is model-agnostic. The LLM Adapter Factory provides a unified interface for calling any supported language model:
| Provider | SDK | Models | Use Case |
|---|---|---|---|
| Google Gemini | google-genai (native) |
Gemini 2.0 Flash, Gemini 2.5 Flash | Primary provider — fast, cost-effective |
| Anthropic Claude | anthropic (native) |
Claude models via Vertex AI | Advanced reasoning tasks |
| Kvant | llm-adapter (native) |
DeepSeek, Llama 4, etc. | Summarization, title generation |
| Azure OpenAI | openai (legacy bridge) |
GPT-4o, GPT-4o-mini | Enterprise customers, specific task quality |
The factory pattern (LLMAdapterFactory) creates provider-specific adapters that all implement LLMAdapterInterface. This means:
- Switching providers is a config change, not a code change. Each node in the graph can use a different provider.
- Per-node provider selection is supported — the intent classifier can use a fast, cheap model while the response generator uses a more capable one.
- The adapter handles provider-specific details (API keys, endpoints, token counting, streaming behavior) behind a uniform interface.
State Management¶
The Global Supervisor maintains a rich state object (GlobalSupervisorState) that flows through every node in the graph. Key state domains:
| Domain | What It Holds |
|---|---|
| Session | Chat history, user ID, conversation ID, session metadata |
| Intent | Classification result, routing flags, detected entities |
| Memory | Retrieved facts, preferences, semantic search results, temporal context |
| Planning | Global plan (steps, current step, agent assignments), execution history |
| Agent | Current agent state, tool calls, agent results |
| UI | Response type, streaming state, prompt variant selection |
| HITL | Interrupt state, pending questions, user responses |
State is checkpointed to Redis at each graph node, enabling: - Resume after interrupts — If the HITL system pauses execution to ask the user a question, the state is saved. When the user responds, execution resumes from exactly where it stopped. - Crash recovery — If the backend restarts mid-conversation, the state can be recovered from the last checkpoint.
Data Layer¶
| Store | Technology | What It Holds | Why This Choice |
|---|---|---|---|
| Primary database | PostgreSQL | Users, conversations, messages, facts, entities, preferences, agent logs | ACID transactions, relational integrity, mature ecosystem |
| Vector store | pgvector (PostgreSQL extension) | Fact embeddings for semantic similarity search | Collocated with primary data — no separate vector DB to manage |
| Cache / state | Redis | Session state, LangGraph checkpoints, stop flags, rate limits | Sub-millisecond reads, pub/sub for real-time events |
Semantic Search (pgvector)¶
The Fact System stores facts as embeddings in pgvector, enabling queries like "find facts related to the user's travel preferences" using cosine similarity search. This is the foundation of Swisper's personalization — retrieved facts are injected into the LLM context so the assistant can reference what it knows about the user.
Security and Privacy¶
| Principle | Implementation |
|---|---|
| European hosting | All infrastructure runs on European servers (EU data residency) |
| Model agnosticism | No dependency on a single LLM provider. Supports Gemini, Claude, Kvant, and Azure OpenAI with per-node selection. |
| No training on user data | User data is never sent to LLM providers for model training. API calls use inference-only endpoints. |
| Authentication | Two-factor authentication (TOTP) via the Authentication module. JWT tokens for session management. |
| HITL consent | Agents cannot execute sensitive actions without explicit user confirmation (HITL interrupt pattern). |
Technology Stack¶
| Layer | Technologies | Notes |
|---|---|---|
| Backend | Python 3.12, FastAPI | Async API, WebSocket support for streaming |
| Agent Framework | LangGraph, LangChain | StateGraph with conditional routing, checkpointed state |
| LLM | Google Gemini, Anthropic Claude, Kvant, Azure OpenAI | Via LLM Adapter Factory (per-node provider selection) |
| Database | PostgreSQL + pgvector, Redis | Facts + embeddings in PG, state + cache in Redis |
| Voice | Azure Speech Services | STT + TTS via WebSocket streaming |
| Frontend | React, TypeScript, Vite | Single-page app with real-time streaming |
| Infrastructure | Docker, Kubernetes, Helm | Container orchestration for deployment |
| CI/CD | GitHub Actions | Backend tests, frontend build, documentation deploy |
| Documentation | Zensical | Static site generator, deployed to EU VPS |
What's Next¶
The architecture is designed to grow. The key extension points are:
- New domain agents — Add a new agent by implementing
DomainAgentInterface, registering it in theDomainAgentRegistry, and the Global Planner can invoke it. No changes to the supervisor graph needed. - New LLM providers — Add a new adapter implementing
LLMAdapterInterfaceand register it in the factory. Configurable per-node. - Swisper Signals — A proactive notification system (implemented) that delivers alerts via Telegram and Threema channels. Includes background jobs for email notifications, daily briefings, pre-meeting prep, commitment reminders, and awaiting-response alerts.
- Swisper Dox (planned) — The B2B execution layer that transforms partner APIs into agent-ready workflows using trust policies and hybrid orchestration.