Skip to content

Platform Architecture Overview

Swisper is a multi-agent AI assistant built on LangGraph. Its architecture follows four core principles:

  • Specialization — Each domain agent masters one capability (research, productivity, documents, wealth).
  • Orchestration — A central Global Supervisor coordinates the entire conversation flow.
  • Personalization — Persistent memory across conversations enables the assistant to learn preferences and context over time.
  • Modularity — New domain agents can be added without changing the orchestrator or other agents.

System Context

The following diagram shows how users, the platform, domain agents, and external services interact:

graph TB
    subgraph Users ["Users"]
        WebUser["Web App\n(React)"]
        VoiceUser["Voice\n(Azure Speech)"]
    end

    subgraph Platform ["Swisper Platform"]
        direction TB
        API["FastAPI\nAPI Layer"]

        subgraph Orchestration ["Core Orchestration"]
            GS["Global Supervisor\n(LangGraph StateGraph)"]
            IC["Intent Classification"]
            GP["Global Planner"]
        end

        subgraph Memory ["Memory & Knowledge"]
            FS["Fact System"]
            ED["Entity Disambiguation"]
            SUM["Summarization"]
            SR["Semantic Retrieval\n(pgvector)"]
        end

        subgraph Interaction ["User Interaction"]
            UI["UI Response System"]
            VS["Voice System"]
            GR["Greeting System"]
            HITL["HITL System"]
        end

        subgraph DomainAgents ["Domain Agents"]
            RA["Research Agent\n(web, weather, news)"]
            PA["Productivity Agent\n(email, calendar)"]
            DA["Document Agent\n(RAG search)"]
            WA["Wealth Agent\n(WealthOS)"]
        end
    end

    subgraph External ["External Services"]
        LLM["LLM Providers\n(Gemini, Claude, Azure OpenAI,\nKvant)"]
        Azure["Azure Speech\nServices"]
        Gmail["Gmail / Outlook"]
        MCP["MCP Research\nService"]
        WOS["WealthOS API"]
    end

    subgraph Data ["Data Layer"]
        PG["PostgreSQL\n+ pgvector"]
        Redis["Redis"]
    end

    WebUser --> API
    VoiceUser --> VS
    VS --> API
    API --> GS
    GS --> IC
    GS --> GP
    GS --> Memory
    GP --> DomainAgents
    DomainAgents --> LLM
    RA --> MCP
    PA --> Gmail
    WA --> WOS
    VS --> Azure
    GS --> UI
    UI --> API
    GS --> Data
    Memory --> Data
    DomainAgents --> Data

Conversation Flow

Every user interaction follows this path through the Global Supervisor:

flowchart TD
    Start["User message arrives"] --> Init["Session Init\n(load history, avatar)"]
    Init --> SumCheck{"Conversation\ntoo long?"}
    SumCheck -->|Yes| Summarize["Summarize\n(compress context)"]
    SumCheck -->|No| Context["Load Context\n(avatar, preferences, facts)"]
    Summarize --> Context
    Context --> HITL{"Pending HITL\ninterrupt?"}
    HITL -->|Yes| HITLHandle["Handle HITL\n(user answered a question)"]
    HITL -->|No| Classify["Classify Intent"]
    HITLHandle --> Classify

    Classify --> Extract["Extract Facts +\nResolve Entities\n(parallel)"]
    Extract --> Disambig{"Entity\nambiguous?"}
    Disambig -->|Yes| AskUser["Ask User\n(which Thomas?)"]
    Disambig -->|No| Retrieve["Semantic +\nTemporal Retrieval"]
    AskUser --> Retrieve

    Retrieve --> Assemble["Assemble Memory"]

    Assemble --> Route{"Simple or\nComplex?"}
    Route -->|Simple| SimpleUI["Generate\nDirect Response"]
    Route -->|Complex| Plan["Global Planner\n(create execution plan)"]

    Plan --> Execute["Execute Domain Agent"]
    Execute --> PlanCheck{"Plan\ncomplete?"}
    PlanCheck -->|No| Plan
    PlanCheck -->|Needs clarification| HITLAsk["Ask User\n(HITL interrupt)"]
    PlanCheck -->|Yes| ComplexUI["Assemble\nFinal Response"]

    SimpleUI --> Persist["Save Messages\nto Database"]
    ComplexUI --> Persist
    HITLAsk --> Persist
    Persist --> Done["Stream Response\nto User"]

Key Routing Decisions

Decision Point Logic
Summarization check If conversation history exceeds token threshold, compress before proceeding
HITL interrupt If a previous agent asked the user a question and they've now answered, resume that flow first
Intent classification Determines simple vs. complex. Sets routing flags for entity handling and retrieval strategy.
Entity disambiguation If the extracted entities are ambiguous (multiple matches), pause and ask the user — either inline (non-blocking) or via HITL (blocking)
Simple vs. complex routing Simple queries go directly to the UI response node. Complex queries go to the Global Planner for multi-step execution.
Agent execution loop The planner and agent executor form a loop — the planner can invoke an agent, evaluate the result, and decide to invoke another agent, ask the user for more info, or finalize.

Domain Agent Architecture

All domain agents implement a common interface (DomainAgentInterface) and are registered in the DomainAgentRegistry. This factory pattern allows the Global Planner to select and invoke agents by name.

Agent Capability External Services Key Tools
Research Agent Web search, weather, news, finance, places, academic papers, patents, flights MCP Research Service Weather lookup, web search, news search, places search
Productivity Agent Email management, calendar operations, contact resolution, daily briefings Gmail API, Microsoft Graph (Office 365) Send email, read inbox, create calendar event, list contacts
Document Agent Semantic search and analysis of uploaded documents (RAG) — (local) Semantic search, document summary
Wealth Agent Client lookup, portfolio analysis, holdings, transactions WealthOS API Client search, portfolio overview, holdings detail

Each agent is itself a LangGraph StateGraph with its own planning, execution, and completion evaluation nodes. The pattern is:

Agent Planner → Tool Execution → Completion Evaluator → (loop or return result)

The Productivity Agent additionally supports multi-provider routing — the same email operations work against both Gmail and Office 365, selected per user account via a provider factory.

LLM Adapter Factory

Swisper is model-agnostic. The LLM Adapter Factory provides a unified interface for calling any supported language model:

Provider SDK Models Use Case
Google Gemini google-genai (native) Gemini 2.0 Flash, Gemini 2.5 Flash Primary provider — fast, cost-effective
Anthropic Claude anthropic (native) Claude models via Vertex AI Advanced reasoning tasks
Kvant llm-adapter (native) DeepSeek, Llama 4, etc. Summarization, title generation
Azure OpenAI openai (legacy bridge) GPT-4o, GPT-4o-mini Enterprise customers, specific task quality

The factory pattern (LLMAdapterFactory) creates provider-specific adapters that all implement LLMAdapterInterface. This means:

  • Switching providers is a config change, not a code change. Each node in the graph can use a different provider.
  • Per-node provider selection is supported — the intent classifier can use a fast, cheap model while the response generator uses a more capable one.
  • The adapter handles provider-specific details (API keys, endpoints, token counting, streaming behavior) behind a uniform interface.

State Management

The Global Supervisor maintains a rich state object (GlobalSupervisorState) that flows through every node in the graph. Key state domains:

Domain What It Holds
Session Chat history, user ID, conversation ID, session metadata
Intent Classification result, routing flags, detected entities
Memory Retrieved facts, preferences, semantic search results, temporal context
Planning Global plan (steps, current step, agent assignments), execution history
Agent Current agent state, tool calls, agent results
UI Response type, streaming state, prompt variant selection
HITL Interrupt state, pending questions, user responses

State is checkpointed to Redis at each graph node, enabling: - Resume after interrupts — If the HITL system pauses execution to ask the user a question, the state is saved. When the user responds, execution resumes from exactly where it stopped. - Crash recovery — If the backend restarts mid-conversation, the state can be recovered from the last checkpoint.

Data Layer

Store Technology What It Holds Why This Choice
Primary database PostgreSQL Users, conversations, messages, facts, entities, preferences, agent logs ACID transactions, relational integrity, mature ecosystem
Vector store pgvector (PostgreSQL extension) Fact embeddings for semantic similarity search Collocated with primary data — no separate vector DB to manage
Cache / state Redis Session state, LangGraph checkpoints, stop flags, rate limits Sub-millisecond reads, pub/sub for real-time events

Semantic Search (pgvector)

The Fact System stores facts as embeddings in pgvector, enabling queries like "find facts related to the user's travel preferences" using cosine similarity search. This is the foundation of Swisper's personalization — retrieved facts are injected into the LLM context so the assistant can reference what it knows about the user.

Security and Privacy

Principle Implementation
European hosting All infrastructure runs on European servers (EU data residency)
Model agnosticism No dependency on a single LLM provider. Supports Gemini, Claude, Kvant, and Azure OpenAI with per-node selection.
No training on user data User data is never sent to LLM providers for model training. API calls use inference-only endpoints.
Authentication Two-factor authentication (TOTP) via the Authentication module. JWT tokens for session management.
HITL consent Agents cannot execute sensitive actions without explicit user confirmation (HITL interrupt pattern).

Technology Stack

Layer Technologies Notes
Backend Python 3.12, FastAPI Async API, WebSocket support for streaming
Agent Framework LangGraph, LangChain StateGraph with conditional routing, checkpointed state
LLM Google Gemini, Anthropic Claude, Kvant, Azure OpenAI Via LLM Adapter Factory (per-node provider selection)
Database PostgreSQL + pgvector, Redis Facts + embeddings in PG, state + cache in Redis
Voice Azure Speech Services STT + TTS via WebSocket streaming
Frontend React, TypeScript, Vite Single-page app with real-time streaming
Infrastructure Docker, Kubernetes, Helm Container orchestration for deployment
CI/CD GitHub Actions Backend tests, frontend build, documentation deploy
Documentation Zensical Static site generator, deployed to EU VPS

What's Next

The architecture is designed to grow. The key extension points are:

  • New domain agents — Add a new agent by implementing DomainAgentInterface, registering it in the DomainAgentRegistry, and the Global Planner can invoke it. No changes to the supervisor graph needed.
  • New LLM providers — Add a new adapter implementing LLMAdapterInterface and register it in the factory. Configurable per-node.
  • Swisper Signals — A proactive notification system (implemented) that delivers alerts via Telegram and Threema channels. Includes background jobs for email notifications, daily briefings, pre-meeting prep, commitment reminders, and awaiting-response alerts.
  • Swisper Dox (planned) — The B2B execution layer that transforms partner APIs into agent-ready workflows using trust policies and hybrid orchestration.