ADR-003: Fact Retrieval Reranker¶
Status: Implemented ✅ Date: 2026-02-02 (spec), 2026-02-03 (implemented) Author: Architecture Team Related: Semantic Retrieval, Entity Context Loader, Safety-Critical Fact Retrieval
Context¶
The current fact retrieval system uses pure vector similarity search (cosine distance) to find relevant facts for a user query. This approach has a fundamental limitation:
Semantic similarity ≠ task relevance
The Problem¶
When a user says "I'm going skiing tomorrow", the system should retrieve: - Travel/activity facts about skiing (works fine - high semantic similarity) - Health facts about injuries (fails - low semantic similarity despite critical relevance)
Example: | Fact | Similarity to "going skiing" | Should Retrieve? | |------|------------------------------|------------------| | "User is going skiing on Feb 3" | 1.00 | Yes ✅ | | "User recovering from flu" | 0.60 | Maybe | | "User's flight to Barcelona" | 0.59 | No | | "User's birthday coming up" | 0.59 | No | | "User broke their leg" | 0.58 | Yes (safety!) ❌ |
The broken leg fact ranks 5th despite being critically relevant for skiing safety. With top_k=3, it gets excluded.
Why Keywords Don't Work¶
We considered adding keyword matching (if "skiing" appears in query and fact prefix, boost score). This was rejected because: - Brittle - doesn't scale to all activities - Language-dependent - "skiing" vs "skifahren" vs "esquí" - Maintenance burden - requires curating activity lists
Decision¶
Implemented: Google Discovery Engine Ranking API with semantic-ranker-fast-004 model.
Implementation Details¶
Three-stage retrieval pipeline:
- Vector Search: Retrieve
RERANKER_CANDIDATES(15) candidates with similarity >= threshold - Relevance Scoring: Rank by
similarity × computed_relevance - Semantic Reranking: Rerank by query-fact relevance using Discovery Engine
# Actual implementation in entity_context_loader.py
# Step 1: Vector search returns 15 candidates per scope
candidate_count = settings.RERANKER_CANDIDATES # 15
# Step 2: After parallel vector search, rerank each scope
if settings.RERANKER_ENABLED:
user_facts, entity_facts_lists = await self._rerank_facts(
query, user_facts, entity_facts_lists, top_k=5
)
Model Choice¶
| Model | Latency | Accuracy | Notes |
|---|---|---|---|
| semantic-ranker-default-004 ✅ | ~80-100ms | Higher | Best for safety-critical facts |
| semantic-ranker-fast-004 | ~80-120ms | High | Similar latency, lower quality |
| Cohere Rerank 3 | ~100ms | High | External vendor, not used |
| Voyage Rerank 2 | ~80ms | High | External vendor, not used |
Why semantic-ranker-default-004?
- Better ranking of safety-critical facts (injury → #2 vs #4)
- Similar latency to fast model (~80-100ms)
- Pushes irrelevant facts to bottom correctly
Why Google Discovery Engine?
- Same GCP project (swisper) - no additional auth
- Global endpoint (EU requires app setup)
Files Changed¶
backend/app/api/services/reranker_service.py- New servicebackend/app/api/services/agents/global_supervisor/nodes/memory_node_helpers/entity_context_loader.py- Integrationbackend/app/core/config.py- Config settingsbackend/pyproject.toml- Addedgoogle-cloud-discoveryenginedependency
Configuration¶
# .env settings
RERANKER_ENABLED=true # Enable/disable reranking
RERANKER_GCP_PROJECT_ID=swisper
RERANKER_LOCATION=global # global, us, or eu (EU requires app setup)
RERANKER_MODEL=semantic-ranker-default-004 # Better quality for safety facts
RERANKER_TOP_N=5 # Final facts after reranking
RERANKER_CANDIDATES=15 # Candidates before reranking
Consequences¶
Benefits (Implemented)¶
- ✅ Improved relevance for safety-critical facts (e.g., "broken leg" retrieved for "going skiing")
- ✅ Better multilingual support (reranker handles semantic relationships across languages)
- ✅ Graceful degradation (falls back to vector search order on failure)
- ✅ Feature-flagged (can disable via
RERANKER_ENABLED=false)
Trade-offs¶
- Additional latency (~50-100ms per retrieval)
- Additional cost (Discovery Engine API calls)
- Requires GCP Discovery Engine API enabled on project
References¶
- Service:
backend/app/api/services/reranker_service.py - Integration:
entity_context_loader.py_rerank_facts()method - Config:
backend/app/core/config.pyRERANKER_*settings - Fact extraction prompt:
fact_extraction.mdRule 4 (injury prefixes - still useful)
Action Items¶
- [x] Evaluate reranker options (chose Google Discovery Engine)
- [x] Implement reranker as optional feature flag
- [ ] Benchmark latency impact on fact retrieval pipeline
- [ ] Add tests for safety-critical fact retrieval scenarios (TC-11k)
- [ ] Monitor reranker cost in production