Entity Disambiguation — Architecture¶

Audience: Architects, tech leads, senior engineers evaluating design decisions and cross-module impact. This document answers "how is this module designed, and why?" Assumes technical fluency but explains domain-specific decisions.

This content was migrated from Documentation/ENTITY_DISAMBIGUATION.md and restructured into audience sections. Review for accuracy against the current codebase.

Context and Purpose¶

Entity Disambiguation exists as a multi-node subsystem to isolate the complex interactive resolution of ambiguous entity mentions from the rest of the conversation pipeline. When a user mentions a person, the system must determine: (1) does this person exist in the database, (2) if multiple matches exist, which one did the user mean, and (3) does the answer depend on knowing which person was meant. This requires database lookups, embedding similarity searches, LLM-based context analysis, and potentially pausing the pipeline for user input — concerns that would create unmanageable coupling if embedded in the main orchestration flow.

A key architectural driver is the relevance-aware split: the system classifies each ambiguous entity as blocking (must resolve before answering) or non-blocking (can answer first, ask later). This produces four distinct conversational flows depending on relevance and route type, each handled by a dedicated UI node.

Architecture Overview¶

graph TD
    subgraph Resolution ["Entity Resolution"]
        ER["Entity Resolution\nNode"]
        HS["Hybrid Search\n(exact + embedding)"]
        FE["Fact Enrichment\n(reranker)"]
        CTX["LLM Context\nResolution"]
    end

    subgraph Enrichment ["Contact Enrichment Cascade"]
        ERS["EntityResolution\nService"]
        PDB["Person DB\n(golden source)"]
        CTR["Contact Table\nResolver"]
        EPR["External Provider\nResolver (Google/MS)"]
        HITL_ASK["HITL: Ask\nUser for Email"]
    end

    subgraph Disambiguation ["Disambiguation Flows"]
        DB["Disambiguation\nBlocking"]
        DS["Disambiguation\nSimple (BTW)"]
        DC["Disambiguation\nComplex (BTW)"]
        DR["Disambiguation\nResolution"]
        DA["Disambiguation\nAcknowledgment"]
        CNE["Create New\nEntity"]
    end

    subgraph HITL_Interpret ["HITL Answer Interpretation"]
        T1["Tier 1: Structured\nAction (pill click)"]
        T2["Tier 2: Text Match\n(deterministic)"]
        T3["Tier 3: LLM\nClassification"]
    end

    IC["Intent Classification\n(entities)"] --> ER
    ER --> HS
    HS -->|"0 matches"| NEW["Create Person"]
    HS -->|"1 match"| RESOLVED["Resolved\n(direct)"]
    HS -->|"2+ matches"| FE
    FE --> CTX
    CTX -->|"certainty >= 0.85"| RESOLVED
    CTX -->|"certainty < 0.85"| REL{"Relevance?"}

    RESOLVED --> ERS
    ERS --> PDB
    PDB -->|"has email"| DONE["Pipeline\nContinues"]
    PDB -->|"no email"| CTR
    CTR -->|"found"| DONE
    CTR -->|"not found"| EPR
    EPR -->|"found"| DONE
    EPR -->|"not found"| HITL_ASK
    HITL_ASK --> DONE

    REL -->|"blocking"| DB
    REL -->|"non_blocking\n+ simple_chat"| DS
    REL -->|"non_blocking\n+ complex_chat"| DC

    DB -->|"user responds"| T1
    DS -->|"user responds"| T1
    DC -->|"user responds"| T1
    T1 -->|"no match"| T2
    T2 -->|"no match"| T3
    T1 -->|"matched"| DR
    T2 -->|"matched"| DR
    T3 -->|"responsive_answer"| DR
    T3 -->|"context_switch"| IC
    T3 -->|"recipient_correction"| DR

    DR -->|"matched"| DONE
    DR -->|"someone else"| CNE
    CNE --> DONE

    DS --> DA
    DC --> DA

The Entity Resolution node is the entry point. It receives entity hints from Intent Classification and runs a hybrid search (exact alias match + embedding similarity) against the user's stored contacts. For single matches, it resolves directly without an LLM call. For multiple matches, it enriches candidates with stored facts, then calls an LLM for context-aware resolution with a certainty score and relevance classification. If certainty is below 0.85, the system triggers one of three disambiguation UI flows based on relevance and route type. The Disambiguation Resolution node processes the user's response (via fast-path keyword matching or LLM semantic matching) and either resolves the entity, creates a new one, or times out.

Component Responsibilities¶

Component	Responsibility
Entity Resolution Node (`entity_resolution.py`)	Entry point. Receives entity hints from Intent Classification, runs hybrid candidate search, invokes LLM for context-aware resolution, detects singleton role conflicts, creates Person records for new entities. Includes umlaut normalization (`normalize_entity_name()`) for German name equivalence (Müller = Mueller = Muller).
Hybrid Search (`find_candidates_hybrid()`)	Two-stage candidate selection: exact alias match (fast, certain) then embedding similarity search (catches typos, variations). Merges and deduplicates results. Exact full-name match fast path resolves deterministically without LLM when exactly one candidate matches the full display name.
Fact Enrichment (reranker-based)	Replaces hardcoded type-based fact filtering with reranker-based query relevance ranking (ADR-003). Facts are ranked by relevance to the user's message, so "broke her leg" ranks high for "wishing a good recovery" regardless of fact type. Falls back to recency sort if reranker unavailable.
LLM Context Resolution (`_resolve_with_llm_context()`)	Calls LLM with candidate facts and conversation context. Returns certainty score, best match, and relevance classification (blocking vs non-blocking). Uses `ContextResolutionResult` Pydantic schema. Context resolution prompt uses graduated Q2 scoring: direct event reference (+0.45) vs unique topical match (+0.30).
EntityResolutionService (`services/contact/service.py`)	Enrichment cascade for contact info (spec §5.4): Person table → Contact table → External providers → HITL signal. Domain-agent agnostic — no imports from agent packages. Every resolution that discovers new emails enriches the Person record so the system never asks the same question twice.
ContactTableResolver (`resolvers/contact_table.py`)	Queries the Contact table by person name to find email addresses from synced email headers. Uses tiered matching: exact, substring, then token-based (handles "Surname, First" vs "First Surname").
ExternalProviderResolver (`resolvers/external_provider.py`)	Wraps Google People API and MS Graph contact lookups into a single resolver. Resolution order: Google first, then Microsoft. Stops at the first provider that returns results.
HITL Answer Classifier (`hitl_answer_classifier.py`)	Three-tier classification for HITL responses. Tier 1: structured JSON action (pill click) — deterministic. Tier 2: text match to presented options — deterministic. Tier 3: LLM classification for ambiguous free text — detects `responsive_answer`, `recipient_correction`, `context_switch`, and `escape` intents.
Disambiguation Blocking Node (`disambiguation_blocking.py`)	Generates ask-only HITL question for blocking entities. Streams response to frontend with clickable `<swi-reply>` options. Sets `user_in_the_loop` to pause pipeline.
Disambiguation Simple Text (`disambiguation_simple_text.py`)	Generates answer + "by the way" question for non-blocking entities on simple_chat route. Answers the user's question first, then appends disambiguation.
Disambiguation Complex Text (`disambiguation_complex_text.py`)	Generates answer + "by the way" question for non-blocking entities on complex_chat route. Runs agent execution first, then appends disambiguation.
Disambiguation Resolution Node (`disambiguation_resolution.py`)	Processes user's disambiguation response. Uses fast-path classifier (action markers, exact name match, number selection) before falling back to LLM semantic matching. Persists pending facts, handles sequential multi-entity flow.
Disambiguation Acknowledgment (`disambiguation_acknowledgment.py`)	Generates brief acknowledgment ("Got it, Thomas Weber!") after non-blocking disambiguation is resolved.
Create New Entity Node (`create_new_entity.py`)	Handles "someone else" flow — creates a new Person record and asks for relationship details.
Role Lookup (`_lookup_person_by_role()`)	Three-strategy role resolution: exact match, synonym-based match (ROLE_SYNONYMS map), semantic embedding search. Used for role-only entities like "my wife."
Sequential State Helpers	Functions for tracking multi-entity disambiguation: `get_current_disambiguation_entity()`, `advance_to_next_entity()`, `mark_entity_resolved()`, `build_sequential_ambiguity_state()`.

Data Model¶

The module uses both persistent data (Person records in PostgreSQL) and in-flight state (the entity_ambiguity structure in pipeline state).

Structure	Contents	Lifecycle
`Person` (PostgreSQL)	`person_id`, `display_name`, `role_to_user`, `aliases`, `is_singleton_role`, `embedding`, `electronic_addresses`, `phone_numbers`	Persistent. Created by Entity Resolution when a new entity is detected. Queried for candidate matching.
`UserFact` (PostgreSQL)	Facts linked to a Person via `subject_entity_id`	Persistent. Used to enrich candidates with distinguishing facts during resolution. Pending facts persisted after disambiguation.
`entity_ambiguity` (state)	`active`, `created_at`, `current_index`, `entities[]` (each with `text`, `candidates[]`, `relevance`, `status`, `resolved_person_id`), `pending_facts[]`, `original_message`, `original_intent_classification`	In-flight. Created by Entity Resolution when ambiguity is detected. Updated by Disambiguation Resolution as entities are resolved. Cleared when all entities are resolved or on timeout.
`resolved_entities` (state)	List of `{text, is_ambiguous, is_new, matched_person_id, email, phone}`	In-flight. Written by Entity Resolution (for direct matches) or Disambiguation Resolution (after user selection). Consumed by downstream nodes.
Resolution flags (state)	`btw_disambiguation_resolved`, `blocking_disambiguation_resolved`	In-flight. Set by Disambiguation Resolution to signal routing: BTW triggers acknowledgment node, blocking triggers answer generation.

Key Design Decisions¶

Decision 1: Relevance-aware disambiguation (blocking vs non-blocking)

Chosen: The LLM classifies each ambiguous entity's relevance to the query. Blocking entities trigger ask-first flows; non-blocking entities trigger answer-first flows.
Rejected: Always asking before answering (the original design before the relevance system).
Rationale: The old approach produced redundant answers — the system would answer, then ask, then answer again. The relevance split eliminates redundancy and improves UX. The trade-off is a more complex routing graph with four distinct flows instead of one.

Decision 2: Hybrid search (exact + embedding) for candidate selection

Chosen: Two-stage search: exact alias match first (fast, O(n) scan), then embedding similarity search (catches typos and variations).
Rejected: Embedding-only search, or database full-text search.
Rationale: Exact matching handles the common case (user types name correctly) with no latency cost. Embedding search covers edge cases (typos, nicknames, transliterations). The combination provides both speed and robustness.

Decision 3: Fast-path classifier before LLM for disambiguation responses

Chosen: A deterministic fast-path classifier handles action markers from frontend pill clicks, number selections, and exact name matches. LLM is only called for free-text responses that need semantic understanding.
Rejected: Always using LLM to interpret disambiguation responses.
Rationale: Most disambiguation responses come from pill clicks, which include machine-readable action markers (e.g., [ACTION:select_0]). Routing these through the LLM would add 200–400ms latency for no accuracy benefit. The LLM is reserved for ambiguous typed responses.

Decision 4: Sequential multi-entity disambiguation

Chosen: When multiple entities are ambiguous, resolve them one at a time across sequential turns, prioritizing blocking entities.
Rejected: Resolving all entities in a single multi-option question.
Rationale: Presenting all ambiguous entities at once creates overwhelming UI and confuses users. Sequential resolution is more natural and allows the system to skip non-blocking entities if they become irrelevant. The trade-off is more conversation turns for multi-entity cases.

Interfaces and Contracts¶

Interface	Direction	Consumer	Contract
`state["intent_classification"]["entities"]`	Inbound	Intent Classification node	List of `ExtractedEntity` dicts with `text`, `type`, `role`, `is_singleton`
`state["resolved_entities"]`	Outbound	Semantic Retrieval, Fact Extraction, Planner	List of `{text, is_ambiguous, is_new, matched_person_id, email, phone}`
`state["entity_ambiguity"]`	Outbound/Internal	Disambiguation UI nodes, Disambiguation Resolution	Sequential state structure with `entities[]`, `current_index`, `pending_facts[]`, `original_message`
`state["user_in_the_loop"]`	Outbound	Global Supervisor routing	`UserInTheLoop` object that triggers pipeline interrupt for user response
`Person` table (PostgreSQL)	Inbound/Outbound	Person Service, Fact Extraction	Read for candidate search; write for new entity creation
`UserFact` table (PostgreSQL)	Inbound	Fact Extraction Service	Read for candidate enrichment during context resolution
`<swi-reply>` tags	Outbound	Frontend	HTML tags in streamed response containing clickable disambiguation options with action markers

Breaking change note: The entity_ambiguity state structure is consumed by all disambiguation UI nodes and the resolution node. Changes to this structure require coordinating updates across 6+ node files and the routing logic.

Decision 5: Enrichment cascade for contact resolution (SA-156)

Chosen: Four-step cascade: Person DB → Contact table → External providers (Google/MS) → HITL. Each step that discovers emails enriches the Person record. The service is domain-agent agnostic.
Rejected: Keeping contact resolution inside the productivity facade with direct provider calls.
Rationale: The old approach created duplicate Person records when HITL provided an email for an already-disambiguated person. The cascade ensures enrichment flows through a single service, and PendingContact.person_id carries the identity through the full HITL lifecycle.

Decision 6: Three-tier HITL answer interpretation (SA-156)

Chosen: Tier 1 (structured action) and Tier 2 (text match) are deterministic and handle all current UI interactions. Tier 3 (LLM) only fires for ambiguous free text. Classifications: responsive_answer, recipient_correction, context_switch, escape.
Rejected: Always using LLM for all HITL responses.
Rationale: Tiers 1 and 2 handle >95% of responses with zero latency. Tier 3 catches edge cases like voice input ("no, I meant leo.cozzoli@fintama.com") and context switches ("actually, check my calendar"). Defaults to responsive_answer on LLM failure (same as previous behavior).

Decision 7: Umlaut normalization via expansion, not collapsing (SA-156)

Chosen: Expand Unicode umlauts to digraphs (ü→ue, ö→oe, ä→ae, ß→ss), then strip remaining diacritics. Both Müller and Mueller normalize to mueller.
Rejected: Collapsing digraphs to base vowels (ue→u, oe→o). This mangled non-German names: Joel→jol, Sue→su, Bauer→bar.
Rationale: Expansion is safe for all names — only actual Unicode characters are transformed. ASCII names pass through untouched.

Known Trade-offs and Debt¶

Full table scan for alias matching: The exact alias match queries all Person records for an avatar and checks aliases in Python rather than using a database index. This is acceptable for the current scale (users have <100 contacts) but would need indexing for larger contact lists.
Embedding search is in-application: Cosine similarity is computed in Python over all Person embeddings rather than using a vector database index (e.g., pgvector ANN). This limits scalability but avoids infrastructure complexity at current scale.
No disambiguation learning: The system doesn't learn from past disambiguation choices. If the user always means "Thomas Weber (friend)" when they say "Thomas," the system still asks every time there are multiple Thomases. A frequency-based prior could improve this.
Singleton role detection relies on LLM: Whether a role is "singleton" (typically one person) is determined by the LLM during intent classification. Misclassification (e.g., treating "cousin" as singleton) can cause false role conflict alerts.
One-turn timeout is aggressive: If the user doesn't respond to the disambiguation question on the immediately next turn, the system times out and drops pending facts. A more forgiving approach would allow the user to return to disambiguation across multiple turns.