Platform Architecture Overview¶

Swisper is a multi-agent AI assistant built on LangGraph. Its architecture follows four core principles:

Specialization — Each domain agent masters one capability (research, productivity, documents, wealth).
Orchestration — A central Global Supervisor coordinates the entire conversation flow.
Personalization — Persistent memory across conversations enables the assistant to learn preferences and context over time.
Modularity — New domain agents can be added without changing the orchestrator or other agents.

System Context¶

The following diagram shows how users, the platform, domain agents, and external services interact:

graph TB
    subgraph Users ["Users"]
        WebUser["Web App\n(React)"]
        VoiceUser["Voice\n(Azure Speech)"]
    end

    subgraph Platform ["Swisper Platform"]
        direction TB
        API["FastAPI\nAPI Layer"]

        subgraph Orchestration ["Core Orchestration"]
            GS["Global Supervisor\n(LangGraph StateGraph)"]
            IC["Intent Classification"]
            GP["Global Planner"]
        end

        subgraph Memory ["Memory & Knowledge"]
            FS["Fact System"]
            ED["Entity Disambiguation"]
            SUM["Summarization"]
            SR["Semantic Retrieval\n(pgvector)"]
        end

        subgraph Interaction ["User Interaction"]
            UI["UI Response System"]
            VS["Voice System"]
            GR["Greeting System"]
            HITL["HITL System"]
        end

        subgraph DomainAgents ["Domain Agents"]
            RA["Research Agent\n(web, weather, news)"]
            PA["Productivity Agent\n(email, calendar)"]
            DA["Document Agent\n(RAG search)"]
            WA["Wealth Agent\n(WealthOS)"]
        end
    end

    subgraph External ["External Services"]
        LLM["LLM Providers\n(Gemini, Claude, Azure OpenAI,\nKvant)"]
        Azure["Azure Speech\nServices"]
        Gmail["Gmail / Outlook"]
        MCP["MCP Research\nService"]
        WOS["WealthOS API"]
    end

    subgraph Data ["Data Layer"]
        PG["PostgreSQL\n+ pgvector"]
        Redis["Redis"]
    end

    WebUser --> API
    VoiceUser --> VS
    VS --> API
    API --> GS
    GS --> IC
    GS --> GP
    GS --> Memory
    GP --> DomainAgents
    DomainAgents --> LLM
    RA --> MCP
    PA --> Gmail
    WA --> WOS
    VS --> Azure
    GS --> UI
    UI --> API
    GS --> Data
    Memory --> Data
    DomainAgents --> Data

Conversation Flow¶

Every user interaction follows this path through the Global Supervisor:

flowchart TD
    Start["User message arrives"] --> Init["Session Init\n(load history, avatar)"]
    Init --> SumCheck{"Conversation\ntoo long?"}
    SumCheck -->|Yes| Summarize["Summarize\n(compress context)"]
    SumCheck -->|No| Context["Load Context\n(avatar, preferences, facts)"]
    Summarize --> Context
    Context --> HITL{"Pending HITL\ninterrupt?"}
    HITL -->|Yes| HITLHandle["Handle HITL\n(user answered a question)"]
    HITL -->|No| Classify["Classify Intent"]
    HITLHandle --> Classify

    Classify --> Extract["Extract Facts +\nResolve Entities\n(parallel)"]
    Extract --> Disambig{"Entity\nambiguous?"}
    Disambig -->|Yes| AskUser["Ask User\n(which Thomas?)"]
    Disambig -->|No| Retrieve["Semantic +\nTemporal Retrieval"]
    AskUser --> Retrieve

    Retrieve --> Assemble["Assemble Memory"]

    Assemble --> Route{"Simple or\nComplex?"}
    Route -->|Simple| SimpleUI["Generate\nDirect Response"]
    Route -->|Complex| Plan["Global Planner\n(create execution plan)"]

    Plan --> Execute["Execute Domain Agent"]
    Execute --> PlanCheck{"Plan\ncomplete?"}
    PlanCheck -->|No| Plan
    PlanCheck -->|Needs clarification| HITLAsk["Ask User\n(HITL interrupt)"]
    PlanCheck -->|Yes| ComplexUI["Assemble\nFinal Response"]

    SimpleUI --> Persist["Save Messages\nto Database"]
    ComplexUI --> Persist
    HITLAsk --> Persist
    Persist --> Done["Stream Response\nto User"]

Key Routing Decisions¶

Decision Point	Logic
Summarization check	If conversation history exceeds token threshold, compress before proceeding
HITL interrupt	If a previous agent asked the user a question and they've now answered, resume that flow first
Intent classification	Determines simple vs. complex. Sets routing flags for entity handling and retrieval strategy.
Entity disambiguation	If the extracted entities are ambiguous (multiple matches), pause and ask the user — either inline (non-blocking) or via HITL (blocking)
Simple vs. complex routing	Simple queries go directly to the UI response node. Complex queries go to the Global Planner for multi-step execution.
Agent execution loop	The planner and agent executor form a loop — the planner can invoke an agent, evaluate the result, and decide to invoke another agent, ask the user for more info, or finalize.

Domain Agent Architecture¶

All domain agents implement a common interface (DomainAgentInterface) and are registered in the DomainAgentRegistry. This factory pattern allows the Global Planner to select and invoke agents by name.

Agent	Capability	External Services	Key Tools
Research Agent	Web search, weather, news, finance, places, academic papers, patents, flights	MCP Research Service	Weather lookup, web search, news search, places search
Productivity Agent	Email management, calendar operations, contact resolution, daily briefings	Gmail API, Microsoft Graph (Office 365)	Send email, read inbox, create calendar event, list contacts
Document Agent	Semantic search and analysis of uploaded documents (RAG)	— (local)	Semantic search, document summary
Wealth Agent	Client lookup, portfolio analysis, holdings, transactions	WealthOS API	Client search, portfolio overview, holdings detail

Each agent is itself a LangGraph StateGraph with its own planning, execution, and completion evaluation nodes. The pattern is:

Agent Planner → Tool Execution → Completion Evaluator → (loop or return result)

The Productivity Agent additionally supports multi-provider routing — the same email operations work against both Gmail and Office 365, selected per user account via a provider factory.

LLM Adapter Factory¶

Swisper is model-agnostic. The LLM Adapter Factory provides a unified interface for calling any supported language model:

Provider	SDK	Models	Use Case
Google Gemini	`google-genai` (native)	Gemini 2.0 Flash, Gemini 2.5 Flash	Primary provider — fast, cost-effective
Anthropic Claude	`anthropic` (native)	Claude models via Vertex AI	Advanced reasoning tasks
Kvant	`llm-adapter` (native)	DeepSeek, Llama 4, etc.	Summarization, title generation
Azure OpenAI	`openai` (legacy bridge)	GPT-4o, GPT-4o-mini	Enterprise customers, specific task quality

The factory pattern (LLMAdapterFactory) creates provider-specific adapters that all implement LLMAdapterInterface. This means:

Switching providers is a config change, not a code change. Each node in the graph can use a different provider.
Per-node provider selection is supported — the intent classifier can use a fast, cheap model while the response generator uses a more capable one.
The adapter handles provider-specific details (API keys, endpoints, token counting, streaming behavior) behind a uniform interface.

State Management¶

The Global Supervisor maintains a rich state object (GlobalSupervisorState) that flows through every node in the graph. Key state domains:

Domain	What It Holds
Session	Chat history, user ID, conversation ID, session metadata
Intent	Classification result, routing flags, detected entities
Memory	Retrieved facts, preferences, semantic search results, temporal context
Planning	Global plan (steps, current step, agent assignments), execution history
Agent	Current agent state, tool calls, agent results
UI	Response type, streaming state, prompt variant selection
HITL	Interrupt state, pending questions, user responses

State is checkpointed to Redis at each graph node, enabling: - Resume after interrupts — If the HITL system pauses execution to ask the user a question, the state is saved. When the user responds, execution resumes from exactly where it stopped. - Crash recovery — If the backend restarts mid-conversation, the state can be recovered from the last checkpoint.

Data Layer¶

Store	Technology	What It Holds	Why This Choice
Primary database	PostgreSQL	Users, conversations, messages, facts, entities, preferences, agent logs	ACID transactions, relational integrity, mature ecosystem
Vector store	pgvector (PostgreSQL extension)	Fact embeddings for semantic similarity search	Collocated with primary data — no separate vector DB to manage
Cache / state	Redis	Session state, LangGraph checkpoints, stop flags, rate limits	Sub-millisecond reads, pub/sub for real-time events

Semantic Search (pgvector)¶

The Fact System stores facts as embeddings in pgvector, enabling queries like "find facts related to the user's travel preferences" using cosine similarity search. This is the foundation of Swisper's personalization — retrieved facts are injected into the LLM context so the assistant can reference what it knows about the user.

Security and Privacy¶

Principle	Implementation
European hosting	All infrastructure runs on European servers (EU data residency)
Model agnosticism	No dependency on a single LLM provider. Supports Gemini, Claude, Kvant, and Azure OpenAI with per-node selection.
No training on user data	User data is never sent to LLM providers for model training. API calls use inference-only endpoints.
Authentication	Two-factor authentication (TOTP) via the Authentication module. JWT tokens for session management.
HITL consent	Agents cannot execute sensitive actions without explicit user confirmation (HITL interrupt pattern).

Technology Stack¶

Layer	Technologies	Notes
Backend	Python 3.12, FastAPI	Async API, WebSocket support for streaming
Agent Framework	LangGraph, LangChain	StateGraph with conditional routing, checkpointed state
LLM	Google Gemini, Anthropic Claude, Kvant, Azure OpenAI	Via LLM Adapter Factory (per-node provider selection)
Database	PostgreSQL + pgvector, Redis	Facts + embeddings in PG, state + cache in Redis
Voice	Azure Speech Services	STT + TTS via WebSocket streaming
Frontend	React, TypeScript, Vite	Single-page app with real-time streaming
Infrastructure	Docker, Kubernetes, Helm	Container orchestration for deployment
CI/CD	GitHub Actions	Backend tests, frontend build, documentation deploy
Documentation	Zensical	Static site generator, deployed to EU VPS

What's Next¶

The architecture is designed to grow. The key extension points are:

New domain agents — Add a new agent by implementing DomainAgentInterface, registering it in the DomainAgentRegistry, and the Global Planner can invoke it. No changes to the supervisor graph needed.
New LLM providers — Add a new adapter implementing LLMAdapterInterface and register it in the factory. Configurable per-node.
Swisper Signals — A proactive notification system (implemented) that delivers alerts via Telegram and Threema channels. Includes background jobs for email notifications, daily briefings, pre-meeting prep, commitment reminders, and awaiting-response alerts.
Swisper Dox (planned) — The B2B execution layer that transforms partner APIs into agent-ready workflows using trust policies and hybrid orchestration.