Skip to content

ADR-003: Fact Retrieval Reranker

Status: Implemented ✅ Date: 2026-02-02 (spec), 2026-02-03 (implemented) Author: Architecture Team Related: Semantic Retrieval, Entity Context Loader, Safety-Critical Fact Retrieval


Context

The current fact retrieval system uses pure vector similarity search (cosine distance) to find relevant facts for a user query. This approach has a fundamental limitation:

Semantic similarity ≠ task relevance

The Problem

When a user says "I'm going skiing tomorrow", the system should retrieve: - Travel/activity facts about skiing (works fine - high semantic similarity) - Health facts about injuries (fails - low semantic similarity despite critical relevance)

Example: | Fact | Similarity to "going skiing" | Should Retrieve? | |------|------------------------------|------------------| | "User is going skiing on Feb 3" | 1.00 | Yes ✅ | | "User recovering from flu" | 0.60 | Maybe | | "User's flight to Barcelona" | 0.59 | No | | "User's birthday coming up" | 0.59 | No | | "User broke their leg" | 0.58 | Yes (safety!) ❌ |

The broken leg fact ranks 5th despite being critically relevant for skiing safety. With top_k=3, it gets excluded.

Why Keywords Don't Work

We considered adding keyword matching (if "skiing" appears in query and fact prefix, boost score). This was rejected because: - Brittle - doesn't scale to all activities - Language-dependent - "skiing" vs "skifahren" vs "esquí" - Maintenance burden - requires curating activity lists


Decision

Implemented: Google Discovery Engine Ranking API with semantic-ranker-fast-004 model.

Implementation Details

Three-stage retrieval pipeline:

  1. Vector Search: Retrieve RERANKER_CANDIDATES (15) candidates with similarity >= threshold
  2. Relevance Scoring: Rank by similarity × computed_relevance
  3. Semantic Reranking: Rerank by query-fact relevance using Discovery Engine
# Actual implementation in entity_context_loader.py
# Step 1: Vector search returns 15 candidates per scope
candidate_count = settings.RERANKER_CANDIDATES  # 15

# Step 2: After parallel vector search, rerank each scope
if settings.RERANKER_ENABLED:
    user_facts, entity_facts_lists = await self._rerank_facts(
        query, user_facts, entity_facts_lists, top_k=5
    )

Model Choice

Model Latency Accuracy Notes
semantic-ranker-default-004 ~80-100ms Higher Best for safety-critical facts
semantic-ranker-fast-004 ~80-120ms High Similar latency, lower quality
Cohere Rerank 3 ~100ms High External vendor, not used
Voyage Rerank 2 ~80ms High External vendor, not used

Why semantic-ranker-default-004? - Better ranking of safety-critical facts (injury → #2 vs #4) - Similar latency to fast model (~80-100ms) - Pushes irrelevant facts to bottom correctly

Why Google Discovery Engine? - Same GCP project (swisper) - no additional auth - Global endpoint (EU requires app setup)

Files Changed

  • backend/app/api/services/reranker_service.py - New service
  • backend/app/api/services/agents/global_supervisor/nodes/memory_node_helpers/entity_context_loader.py - Integration
  • backend/app/core/config.py - Config settings
  • backend/pyproject.toml - Added google-cloud-discoveryengine dependency

Configuration

# .env settings
RERANKER_ENABLED=true          # Enable/disable reranking
RERANKER_GCP_PROJECT_ID=swisper
RERANKER_LOCATION=global       # global, us, or eu (EU requires app setup)
RERANKER_MODEL=semantic-ranker-default-004  # Better quality for safety facts
RERANKER_TOP_N=5               # Final facts after reranking
RERANKER_CANDIDATES=15         # Candidates before reranking

Consequences

Benefits (Implemented)

  • ✅ Improved relevance for safety-critical facts (e.g., "broken leg" retrieved for "going skiing")
  • ✅ Better multilingual support (reranker handles semantic relationships across languages)
  • ✅ Graceful degradation (falls back to vector search order on failure)
  • ✅ Feature-flagged (can disable via RERANKER_ENABLED=false)

Trade-offs

  • Additional latency (~50-100ms per retrieval)
  • Additional cost (Discovery Engine API calls)
  • Requires GCP Discovery Engine API enabled on project

References

  • Service: backend/app/api/services/reranker_service.py
  • Integration: entity_context_loader.py _rerank_facts() method
  • Config: backend/app/core/config.py RERANKER_* settings
  • Fact extraction prompt: fact_extraction.md Rule 4 (injury prefixes - still useful)

Action Items

  • [x] Evaluate reranker options (chose Google Discovery Engine)
  • [x] Implement reranker as optional feature flag
  • [ ] Benchmark latency impact on fact retrieval pipeline
  • [ ] Add tests for safety-critical fact retrieval scenarios (TC-11k)
  • [ ] Monitor reranker cost in production