Fact System — Architecture¶

This content was migrated from Documentation/fact_entity_preference_extraction.md and restructured into audience sections. Review for accuracy against the current codebase.

Context and Purpose¶

The Fact System exists to give Swisper persistent, personalized memory. Without it, every conversation would start from zero — the assistant would not know the user's name, family, allergies, or preferences.

The driving requirements behind its design are:

Automatic extraction — Users should not have to explicitly save information. The system learns from natural conversation
Entity-first attribution — Every fact must be correctly linked to the right person. Misattributed facts (storing a son's hobby under a colleague's name) are worse than missing facts
Non-blocking extraction — Fact extraction must not add latency to the user's response. It runs in the background while the response streams
Semantic retrieval — Facts must be retrievable by meaning, not just exact text. "What does Martin eat?" should surface dietary facts even if stored as "Martin is vegetarian"

Architecture Overview¶

The Fact System spans multiple nodes in the Global Supervisor graph and a set of backend services for persistence and retrieval.

flowchart TB
    subgraph Extraction["Extraction Pipeline"]
        MSG([User Message]) --> IC[Intent Classification]
        IC -->|has_extractable_facts| FE[Fact Extraction Node]
        IC -->|entities detected| ER[Entity Resolution Node]
        FE --> EM[Extraction Merge Node]
        ER --> EM
        EM -->|conflict detected| FCR[Fact Conflict Resolution]
        EM -->|no conflict| PERSIST
    end

    subgraph Retrieval["Retrieval Pipeline"]
        IC -->|needs_semantic_retrieval| SR[Semantic Retrieval]
        IC -->|temporal query| TR[Temporal Retrieval]
        SR --> MA[Memory Assembly]
        TR --> MA
    end

    subgraph Storage["Persistence Layer"]
        PERSIST[Fact Persistence Service]
        PERSIST --> PG[(PostgreSQL + pgvector)]
        PERSIST --> EMBED[Embedding Generation]
        EMBED -->|Vertex AI| PG
        MA --> CACHE[(Redis Cache)]
    end

    subgraph Output["To UI Response"]
        MA --> UI[UI Response Nodes]
        UI -->|facts woven into response| RESP([User Response])
    end

Flow summary: Intent classification determines which pipeline stages run. Fact extraction and entity resolution run in parallel, then merge. The merge step links facts to entities, detects conflicts, and persists. Retrieval pulls relevant facts via vector search and temporal queries, assembling them for the UI response nodes.

Component Responsibilities¶

Component	Responsibility
Fact Extraction Node	LLM call to extract facts from the user message. Applies the persistence test (would the user want this remembered next week?)
Entity Resolution Node	Resolves entity mentions against the user's contact database. Detects ambiguity. Creates new entities when red-flag analysis confirms a new person
Extraction Merge Node	Links extracted facts to resolved entities via `subject_entity_id`. Detects fact conflicts. Filters facts about ambiguous/skipped entities
Fact Conflict Resolution Node	Routes conflicting facts (changed email, updated job) through HITL for user confirmation
Semantic Retrieval Node	Vector similarity search against stored fact embeddings. Returns top-K relevant facts
Temporal Retrieval Node	Time-based fact retrieval for date-anchored queries (birthdays, scheduled events, travel)
Memory Assembly Node	Merges semantic and temporal retrieval results into unified fact context
Fact Persistence Service	Stores facts to PostgreSQL with embedding generation, deduplication, and confidence scoring
Fact Retrieval Service	Orchestrates vector search with relevance thresholds and entity-scoped boosting
Preference Extraction Service	Extracts communication preferences from conversation. Conditional — only runs when `has_preferences` flag is set

Data Model¶

Fact Schema¶

Field	Type	Purpose
`fact_id`	UUID	Primary key
`text`	encrypted string	Fact content (max 200 chars), PGP-encrypted
`type`	FactType enum	Category: Schedule, Travel, Health, Relationship, Work, Hobby, etc.
`embedding`	vector(2000)	pgvector embedding for semantic search (Vertex AI `gemini-embedding-001`)
`confidence`	float	0.0–1.0, boosted to 0.9 for critical types (Allergy, Medical, Health)
`subject_entity_id`	UUID (nullable)	Foreign key to `persons` table. NULL for user-about-self facts
`relation_to_user`	boolean	`true` = fact about the user; `false` = fact about an entity
`scope`	string	`"avatar"` (all workspaces) or `"workspace"` (workspace-specific)
`durability`	string	`"durable"`, `"long"`, `"medium"`, `"short"`
`sensitivity`	string	`"low"`, `"medium"`, `"high"` — controls privacy mode filtering
`temporal_marker`	string	Raw time reference ("next Friday", "June 15th")
`time_anchor`	date	Resolved date for temporal queries

Entity Schema (Persons Table)¶

Field	Type	Purpose
`person_id`	UUID	Primary key
`display_name`	encrypted string	Name (PGP-encrypted)
`role_to_user`	PersonRole enum	partner, child, parent, friend, colleague, pet, service_provider, other
`aliases`	string array	Alternative names and references
`user_id`, `avatar_id`	UUID	Ownership scoping

Fact Attribution Model¶

Attribution Type	`subject_entity_id`	`relation_to_user`	Example
User fact	NULL	`true`	"User is allergic to peanuts"
Entity fact	person UUID	`false`	"Leo loves pizza"
Orphan fact	NULL	`false`	Must be prevented — skipped during extraction

Key Design Decisions¶

1. Skip Ambiguous Entities Over Guessing¶

Chosen: When entity resolution is ambiguous (confidence < 0.8), skip both the entity and associated facts entirely
Rejected: Store with NULL entity; store with best-guess entity; create a new entity
Rationale: Orphan facts (NULL entity, not about user) are unusable and pollute the database. Wrong attribution is worse than missing data. The system loses some valid facts but prevents entity explosion and misattribution

2. Background Extraction (Non-Blocking)¶

Chosen: Fact extraction runs as a background task, not blocking the response
Rejected: Synchronous extraction before response generation
Rationale: Extraction takes 3–4.5 seconds (LLM call + storage). Making users wait that long would be unacceptable. Current-turn information is available directly from the message; extracted facts are needed for future turns

3. Unified LLM Call for Facts + Entities + Topics¶

Chosen: Single LLM call extracts all three outputs
Rejected: Separate LLM calls for each extraction type
Rationale: More token-efficient and faster. The LLM has full context to make coherent entity-fact linking decisions within a single pass

4. Semantic Deduplication Over String Matching¶

Chosen: LLM-based entity resolution with red-flag detection and coherence analysis
Rejected: String matching, fuzzy matching, phonetic matching
Rationale: "Martin (son, 8 years old)" and "Martin (colleague, accountant)" share the same name but are clearly different people. Only semantic understanding can detect age-career contradictions and relationship impossibilities

Interfaces and Contracts¶

Interface	Direction	Format	Consumer
Intent Classification → Fact Pipeline	Inbound	Optimization flags in `GlobalSupervisorState`	Fact Extraction, Entity Resolution nodes
Fact Persistence → PostgreSQL	Outbound	SQLModel ORM with pgvector	Facts table with embeddings
Fact Retrieval → Semantic Search	Bidirectional	Vector similarity query (cosine distance)	Memory Assembly node
Extraction Merge → Conflict Queue	Outbound	Redis key-value (chat_id → conflicts)	Fact Conflict Resolution node
Fact System → UI Response	Outbound	`facts_by_entity` dict in state	UI Response Nodes for prompt injection
Facts API → Settings UI	Bidirectional	REST API (`/api/v1/users/{user_id}/facts`)	Frontend fact management

Known Trade-offs and Debt¶

Item	Impact	Remediation
Fact decay is batch-only	A `fact_decay_batch` background job (`swisper/jobs/`) reduces fact relevance over time, but decay is not applied inline during retrieval — old facts may still rank highly until the next batch run	Consider integrating decay scoring directly into the retrieval query for real-time relevance adjustment
Embedding model lock-in	Changing the embedding model (currently Vertex AI `gemini-embedding-001`, dimension 2000) requires re-embedding all stored facts	Maintain embedding model version metadata; build a migration pipeline for re-embedding
Preference persistence gap	Conversational preferences ("be brief") are session-only. Users must use Settings UI for permanent changes, which is not discoverable	Consider auto-promoting frequently repeated session preferences to permanent
Entity resolution is extraction-time only	Once an entity decision is made, it's not revisited. A wrong decision persists unless manually corrected	Add periodic entity reconciliation or allow users to merge/split entities via the Settings UI