Skip to content

Fact System — Overview

This content was migrated from Documentation/fact_entity_preference_extraction.md and restructured into audience sections. Review for accuracy against the current codebase.

What This Module Does

The Fact System is Swisper's long-term memory. It continuously learns from conversations to build a comprehensive understanding of each user — their relationships, preferences, health information, travel plans, and professional context. When a user says "I'm allergic to peanuts" or "my son Leo's birthday is in March," the system extracts that information, stores it, and uses it in future conversations to provide truly personalized responses.

The system handles three types of knowledge extraction:

  • Facts — Persistent information about the user and the people in their life (allergies, birthdays, job titles, travel plans)
  • Entities — People, pets, and relationships the user mentions (family members, colleagues, friends, service providers)
  • Preferences — How the user wants Swisper to communicate (be brief, use bullet points, formal tone)

All of this happens automatically in the background — users don't need to explicitly "save" information. The system also knows what not to remember: temporary states ("I'm at the airport"), action requests ("check my email"), and references to public figures are filtered out.

Who It Serves

Persona Need
End Users A personal assistant that remembers what matters — family details, dietary restrictions, upcoming travel — and uses that knowledge naturally in conversations
Product Owners Understanding of how personalization works, what data is captured, and how privacy is maintained
Support Staff Ability to explain why Swisper remembered (or didn't remember) certain information

Key Capabilities

  • Automatic fact extraction — Extracts persistent facts from every conversation turn without user intervention. Uses a "persistence test": would the user want me to remember this next week?
  • Intelligent entity resolution — Recognizes people mentioned in conversation and links them to existing contacts. Uses semantic understanding (not just name matching) to distinguish "Martin (son, 8 years old)" from "Martin (colleague, accountant)"
  • Safety-first prioritization — Allergies, medical conditions, and health information are treated as critical facts with boosted confidence scores and priority handling
  • Ambiguity handling — When the system can't confidently determine which entity a fact belongs to, it skips storage entirely rather than risk attributing information to the wrong person
  • Fact conflict detection — Detects when new information contradicts existing facts (e.g., a changed email address) and routes to user confirmation before overwriting
  • Vector-based retrieval — Stores fact embeddings for semantic search, so asking "what does Martin like to eat?" retrieves relevant dietary facts even if the exact words don't match
  • Preference layering — Applies communication preferences in layers: platform defaults → workspace settings → session overrides → per-message commands

How It Fits in the Platform

The Fact System operates within the Global Supervisor's memory pipeline:

  • Upstream: Receives the user's message after intent classification, which sets optimization flags (has_extractable_facts, needs_semantic_retrieval) to control which parts of the fact pipeline run
  • Downstream: Provides facts and entity context to the UI Response System, which weaves them naturally into Swisper's responses
  • Storage: Persists facts and entities to PostgreSQL with pgvector embeddings for semantic search. Uses Redis for caching and fact conflict queuing
  • Entity integration: Links facts to resolved entities via foreign keys, ensuring every fact is properly attributed to the right person
  • Retrieval: Semantic retrieval and temporal retrieval nodes pull relevant facts for each conversation turn

Limits and Edge Cases

  • Background extraction latency — Facts extracted from the current message are available for the next turn, not the current one. The current turn uses information directly from the message to avoid blocking the response
  • Ambiguous entities cause data loss — When the system can't determine which entity a fact belongs to, it drops the fact entirely. This prevents data corruption but means some valid facts may not be stored
  • Embedding model dependency — Fact retrieval quality depends on the embedding model (currently Vertex AI gemini-embedding-001). Changes to the embedding model require re-embedding all stored facts
  • Session-only preferences — Preferences extracted from conversation ("be brief") apply only to the current session. For permanent preferences, users must configure them through the Settings UI

FAQ

Q: How does Swisper decide what to remember from a conversation? A: The system applies a "persistence test" to every piece of information: would the user want me to remember this next week? Facts like allergies, birthdays, and job titles pass the test. Temporary states ("I'm at the airport"), action requests ("check my email"), and questions ("what's the weather?") are filtered out.

Q: What happens when Swisper encounters two people with the same name? A: The entity resolution system uses semantic understanding — not just name matching — to distinguish between people. It checks for "red flags" like age-career contradictions (a child can't be a COO) and coherence (yoga instructor adding pilates makes sense). If it still can't determine which person is meant, it skips storing the fact rather than guessing wrong.

Q: Is my personal information secure? A: Yes. Facts are stored with encryption in the database, scoped to your specific avatar and workspace. A privacy mode controls whether sensitive information (medical, allergies) is included in responses. Workspace isolation ensures your data is not shared across workspaces.

Q: How much does the fact extraction cost per message? A: Fact and entity extraction uses approximately 9,600 tokens per message (~0.004 CHF at standard pricing). Because extraction runs in the background while you read the response, it adds zero perceived latency. Preference extraction adds another ~4,300 tokens but only runs on 30–40% of messages that contain preference commands.

Q: Can I manually manage my stored facts? A: Yes. The Settings UI provides full CRUD operations for viewing, editing, and deleting facts. You can also add facts during onboarding via a free-text description. Facts can be managed at both the avatar level (visible across all workspaces) and workspace level (scoped to a specific workspace).