TDR-005: Fact Retrieval Service Has Dead Metrics, Read/Write Coupling, and Redundant Wrapping¶
Description¶
apps/backend/swisper/services/fact_retrieval.py is the load-bearing read path for user facts (used by RecallTool, the planner, and the proposed voice-v3 ear-side memory tool). Reviewing it surfaced several smells worth a focused cleanup pass before we put it on the per-turn voice latency budget.
Issues observed in retrieve_relevant_facts_cached and its neighbours:
-
Prometheus metrics with no Prometheus.
facts_retrieved_total,fact_relevance_score_distribution,fact_retrieval_duration_seconds, andfact_fallback_queries_totalare imported and incremented on every call. The platform does not run a Prometheus scraper today — these are dead-cost noop counters with import overhead. -
Read function commits the DB session. Lines ~309-316 bump
last_used_atand calldb.commit()from inside what looks like a pure read. This silently autonomously commits any other pending work in the caller's request-scoped session. On the error path, lines ~320-321 / ~347-348 also calldb.rollback(). A retrieval call should not own the caller's transaction lifecycle. -
Redundant wrapper.
retrieve_relevant_facts(lines 123-174) is a thin pass-through that just callsretrieve_relevant_facts_cachedwith the same arguments. Two public methods with identical semantics is a maintenance trap — callers can't tell if there's a difference. -
Threshold default mismatch. The function default is
relevance_threshold=0.7(strict). RecallTool overrides to0.3(permissive) every call. The "right" default isn't obvious without data — currently it depends on which entry point the caller used. Should be one value, chosen by measurement, baked in. -
Two cache layers, mismatched TTLs. Embedding cache is 5-minute TTL (line ~383); fact-result cache is 1-hour (line ~334). A user repeating the same question 6 minutes later re-embeds but reuses fact-result cache (different cache key paths). The TTL mismatch creates non-obvious behaviour without an obvious benefit.
-
Fallback query fires automatically. If primary returns empty, lines ~244-256 immediately retry with a lower threshold. That's two pgvector round-trips for empty results. A single relaxed-threshold query, or trusting the caller's threshold, would be simpler.
-
Reranker feature flag via env var with cache mismatch risk.
_is_reranker_enabled()is checked on each call (line ~222). If toggled mid-process, cached pre-rerank results may be returned to a post-rerank caller (or vice versa) since the rerank state is not in the cache key. -
Durability decay computed in raw SQL string.
_get_durability_decay_sql()(lines ~404-418) returns a string concatenated into the query. Hardcoded half-lives (7d / 90d / 365d) live inside the string. Hard to test, hard to vary by experiment, easy to typo into a silent regression. -
Verbose per-call logging. Every call logs cache hit/miss, retrieved-facts count, and per-fact scores at INFO. Acceptable on the planner path (low cadence). On the proposed voice-v3 reactive
recallpath it would log every user turn — log volume worth quantifying before we add another caller.
Impact¶
- Latency: read-side commit (issue 2) couples read latency to write latency. On voice's tight per-turn budget, an unexpected
db.commit()blocking the response is hard to predict and hard to debug. Dead Prometheus increments (issue 1) are negligible individually but show up at voice cadence. - Correctness: silent transaction commits/rollbacks (issue 2) can clobber unrelated in-flight writes in the caller's session.
- Maintainability: redundant wrapper (issue 3), threshold mismatch (issue 4), and SQL-as-string decay (issue 8) make changes risky and hard to reason about.
- Observability: dead metrics (issue 1) are misleading — they look like the service is observable when it isn't.
Proposed Resolution¶
Single small refactor PR, before adding the voice-v3 reactive recall ear-tool:
- Delete the unused Prometheus metric imports and increments. If we want metrics later, add them through whatever observability stack we actually run.
- Move
last_used_atbump out of the retrieval function — return the fact ids and let a higher-level caller decide when (and on which session) to commit. Or do it in a separate background task. Either way, drop thedb.commit()/db.rollback()from the read path. - Delete
retrieve_relevant_facts. Keep onlyretrieve_relevant_facts_cached(renaming toretrieve_relevant_factsif the suffix feels clinical). - Pick one default
relevance_thresholdbased on a quick eyeball of cache hit data; remove the per-call override at RecallTool. - Decide: collapse fallback into a single permissive query OR keep two-stage but document why.
- Move durability half-lives to a typed config object; build the decay expression with SQLAlchemy primitives, not string concatenation.
- Demote per-call retrieval logs to DEBUG; emit a single INFO summary line per call with counts only.
Priority¶
Medium. Not blocking voice-v3 ear-tools — the reactive recall path can ship on top of the current code. But the read/write coupling (issue 2) is a real footgun and the dead metrics (issue 1) are quick wins. Worth a half-day cleanup pass before this file ends up on yet another hot path.
Files Affected¶
apps/backend/swisper/services/fact_retrieval.py— primary cleanupapps/backend/swisper/agents/memory/tools/recall.py— drop threshold override, adjust to single retrieval entry point- Any other caller of
retrieve_relevant_facts(the legacy non-cached entry point) — update import