TDR-005: Fact Retrieval Service Has Dead Metrics, Read/Write Coupling, and Redundant Wrapping¶

Description¶

apps/backend/swisper/services/fact_retrieval.py is the load-bearing read path for user facts (used by RecallTool, the planner, and the proposed voice-v3 ear-side memory tool). Reviewing it surfaced several smells worth a focused cleanup pass before we put it on the per-turn voice latency budget.

Issues observed in retrieve_relevant_facts_cached and its neighbours:

Prometheus metrics with no Prometheus. facts_retrieved_total, fact_relevance_score_distribution, fact_retrieval_duration_seconds, and fact_fallback_queries_total are imported and incremented on every call. The platform does not run a Prometheus scraper today — these are dead-cost noop counters with import overhead.
Read function commits the DB session. Lines ~309-316 bump last_used_at and call db.commit() from inside what looks like a pure read. This silently autonomously commits any other pending work in the caller's request-scoped session. On the error path, lines ~320-321 / ~347-348 also call db.rollback(). A retrieval call should not own the caller's transaction lifecycle.
Redundant wrapper. retrieve_relevant_facts (lines 123-174) is a thin pass-through that just calls retrieve_relevant_facts_cached with the same arguments. Two public methods with identical semantics is a maintenance trap — callers can't tell if there's a difference.
Threshold default mismatch. The function default is relevance_threshold=0.7 (strict). RecallTool overrides to 0.3 (permissive) every call. The "right" default isn't obvious without data — currently it depends on which entry point the caller used. Should be one value, chosen by measurement, baked in.
Two cache layers, mismatched TTLs. Embedding cache is 5-minute TTL (line ~383); fact-result cache is 1-hour (line ~334). A user repeating the same question 6 minutes later re-embeds but reuses fact-result cache (different cache key paths). The TTL mismatch creates non-obvious behaviour without an obvious benefit.
Fallback query fires automatically. If primary returns empty, lines ~244-256 immediately retry with a lower threshold. That's two pgvector round-trips for empty results. A single relaxed-threshold query, or trusting the caller's threshold, would be simpler.
Reranker feature flag via env var with cache mismatch risk. _is_reranker_enabled() is checked on each call (line ~222). If toggled mid-process, cached pre-rerank results may be returned to a post-rerank caller (or vice versa) since the rerank state is not in the cache key.
Durability decay computed in raw SQL string. _get_durability_decay_sql() (lines ~404-418) returns a string concatenated into the query. Hardcoded half-lives (7d / 90d / 365d) live inside the string. Hard to test, hard to vary by experiment, easy to typo into a silent regression.
Verbose per-call logging. Every call logs cache hit/miss, retrieved-facts count, and per-fact scores at INFO. Acceptable on the planner path (low cadence). On the proposed voice-v3 reactive recall path it would log every user turn — log volume worth quantifying before we add another caller.

Impact¶

Latency: read-side commit (issue 2) couples read latency to write latency. On voice's tight per-turn budget, an unexpected db.commit() blocking the response is hard to predict and hard to debug. Dead Prometheus increments (issue 1) are negligible individually but show up at voice cadence.
Correctness: silent transaction commits/rollbacks (issue 2) can clobber unrelated in-flight writes in the caller's session.
Maintainability: redundant wrapper (issue 3), threshold mismatch (issue 4), and SQL-as-string decay (issue 8) make changes risky and hard to reason about.
Observability: dead metrics (issue 1) are misleading — they look like the service is observable when it isn't.

Proposed Resolution¶

Single small refactor PR, before adding the voice-v3 reactive recall ear-tool:

Delete the unused Prometheus metric imports and increments. If we want metrics later, add them through whatever observability stack we actually run.
Move last_used_at bump out of the retrieval function — return the fact ids and let a higher-level caller decide when (and on which session) to commit. Or do it in a separate background task. Either way, drop the db.commit() / db.rollback() from the read path.
Delete retrieve_relevant_facts. Keep only retrieve_relevant_facts_cached (renaming to retrieve_relevant_facts if the suffix feels clinical).
Pick one default relevance_threshold based on a quick eyeball of cache hit data; remove the per-call override at RecallTool.
Decide: collapse fallback into a single permissive query OR keep two-stage but document why.
Move durability half-lives to a typed config object; build the decay expression with SQLAlchemy primitives, not string concatenation.
Demote per-call retrieval logs to DEBUG; emit a single INFO summary line per call with counts only.

Priority¶

Medium. Not blocking voice-v3 ear-tools — the reactive recall path can ship on top of the current code. But the read/write coupling (issue 2) is a real footgun and the dead metrics (issue 1) are quick wins. Worth a half-day cleanup pass before this file ends up on yet another hot path.

Files Affected¶

apps/backend/swisper/services/fact_retrieval.py — primary cleanup
apps/backend/swisper/agents/memory/tools/recall.py — drop threshold override, adjust to single retrieval entry point
Any other caller of retrieve_relevant_facts (the legacy non-cached entry point) — update import