Entity Disambiguation — Overview¶

Audience: Business stakeholders, product owners, analysts, new team members. This document answers "what does this module do and why does it matter?" in plain language. No unexplained technical jargon.

This content was migrated from Documentation/ENTITY_DISAMBIGUATION.md and restructured into audience sections. Review for accuracy against the current codebase.

What This Module Does¶

When a user mentions a person — by name, relationship, or pronoun — Swisper needs to know exactly who they mean. Entity Disambiguation handles situations where the system has multiple people on file who could match. For example, if the user says "schedule lunch with Thomas" and the system knows two people named Thomas, this module figures out which one before proceeding.

The key innovation is relevance-aware disambiguation. Not every ambiguous mention needs to be resolved before answering. If the user says "Thomas was at a party — what is MCP?", the answer about MCP doesn't depend on which Thomas. In that case, the system answers the question first and asks "by the way, which Thomas?" afterward. But if the user says "suggest a restaurant for Thomas," dietary preferences matter — so the system asks first, then answers. This avoids the frustrating pattern of asking unnecessary questions or giving answers that have to be repeated.

The module also handles situations where the user mentions someone the system doesn't recognize at all ("someone else"), where a newly mentioned person conflicts with an existing relationship (e.g., a second urologist when the system already knows one), and where multiple ambiguous people appear in the same message.

Who It Serves¶

End users who mention people in conversation and expect the assistant to keep track of who's who — without being pestered with unnecessary clarification questions.
Product owners who need to understand how the system handles the complexity of personal relationship management and when it asks the user for help.
New team members looking to understand one of the most complex interactive flows in Swisper's conversation pipeline.

Key Capabilities¶

Resolves entity mentions by matching names against stored contacts using exact matching, alias matching, and semantic similarity. Supports German umlaut equivalence (Müller = Mueller = Muller) and Unicode accent folding (é→e, ñ→n) without mangling non-German names.
Classifies ambiguous mentions as blocking (must ask first) or non-blocking (answer first, ask later) based on whether the person's identity affects the answer quality.
Presents disambiguation questions with clickable options, including a "someone else" option for unknown people.
Enrichment cascade for contact information: when a resolved person lacks an email, the system automatically searches synced email headers, then Google/Microsoft contacts, before asking the user. Each discovery enriches the Person record so the question is never asked again.
Three-tier HITL answer interpretation: pill clicks and typed selections resolve instantly (no LLM). Ambiguous free text (including voice input) is classified by LLM to detect recipient corrections, context switches, and escape intents.
Handles singleton role conflicts — when a user mentions a new person in a role where they typically have only one (e.g., a new urologist when one is already on file), the system asks whether the new person replaces the existing one.
Supports multi-entity disambiguation — when a message mentions multiple ambiguous people, the system resolves the most important one first and handles the rest sequentially.
Skips disambiguation entirely when the ambiguous person is irrelevant to both the answer and fact storage, avoiding unnecessary questions.
Creates new contact records automatically when users mention people the system hasn't seen before.
Uses reranker-based fact enrichment to select the most relevant facts for each candidate during context resolution, replacing hardcoded type-based filtering.

How It Fits in the Platform¶

Entity Disambiguation sits immediately after Intent Classification in the Global Supervisor pipeline. It receives the entity mentions extracted by Intent Classification and resolves them against the user's stored contacts. If disambiguation is needed, the pipeline pauses for a Human-in-the-Loop (HITL) interaction — the user sees clickable options and selects the right person. Once resolved, the entities flow downstream to Semantic Retrieval (which loads relevant facts about the person), Fact Extraction (which stores new facts linked to the correct person), and the Planner (which uses contact details for scheduling, emailing, etc.). The module also interacts with the UI Response System, which generates the natural-language disambiguation questions and acknowledgments.

Limits and Edge Cases¶

Embedding similarity threshold: The system uses a 0.70 cosine similarity threshold for matching names. Names that are spelled very differently from stored entries (heavy typos or transliterations) may not match, resulting in the system treating them as a new person.
Pronoun resolution is limited: Pronouns like "him" or "her" rely on the previous turn's entity context. If the user refers to someone mentioned several turns ago, the system may not resolve the pronoun correctly.
Single-turn HITL timeout: If the user doesn't respond to a disambiguation question on the next turn (e.g., they change the topic), the system times out, drops pending facts for the ambiguous entity, and moves on. This prevents the conversation from getting stuck, but means some facts may be lost.
Sequential disambiguation can feel verbose: When multiple people in a message are ambiguous, the system asks about them one at a time across multiple turns. For messages with three or more ambiguous names, this can feel like an interrogation.
External provider availability: The enrichment cascade depends on Google People API and MS Graph being connected. If neither integration is configured, the cascade skips directly to HITL.

FAQ¶

Q: Why does Swisper sometimes ask "which Thomas?" before answering, and sometimes after? A: It depends on whether knowing which Thomas matters for the answer. If you ask "suggest a restaurant for Thomas," the system needs to know Thomas's dietary preferences before it can recommend anything — so it asks first. But if you say "Thomas was at a party — what is MCP?", the answer about MCP is the same regardless of which Thomas, so the system answers first and asks casually afterward.

Q: What happens if I mention someone Swisper doesn't know? A: The system shows a "someone else" option alongside the known matches. If you select it, the system creates a new contact record and asks a brief follow-up question to learn a bit about the new person (like how you know them).

Q: Can the system get it wrong and link information to the wrong person? A: The system uses a confidence threshold (85%) for automatic resolution. Below that threshold, it always asks you. However, if two people have very similar names and roles, and the context doesn't clearly distinguish them, there's a small chance of misattribution. You can always correct this by telling the assistant.

Q: What if I have two urologists — won't the system get confused? A: The system tracks "singleton roles" — roles where most people have only one person (like a spouse or primary doctor). If you mention a second person in a singleton role, the system asks whether the new person replaces the existing one or is an additional one. You can have multiple people in any role.