Skip to content

TDR-005: Fact Retrieval Service Has Dead Metrics, Read/Write Coupling, and Redundant Wrapping

Description

apps/backend/swisper/services/fact_retrieval.py is the load-bearing read path for user facts (used by RecallTool, the planner, and the proposed voice-v3 ear-side memory tool). Reviewing it surfaced several smells worth a focused cleanup pass before we put it on the per-turn voice latency budget.

Issues observed in retrieve_relevant_facts_cached and its neighbours:

  1. Prometheus metrics with no Prometheus. facts_retrieved_total, fact_relevance_score_distribution, fact_retrieval_duration_seconds, and fact_fallback_queries_total are imported and incremented on every call. The platform does not run a Prometheus scraper today — these are dead-cost noop counters with import overhead.

  2. Read function commits the DB session. Lines ~309-316 bump last_used_at and call db.commit() from inside what looks like a pure read. This silently autonomously commits any other pending work in the caller's request-scoped session. On the error path, lines ~320-321 / ~347-348 also call db.rollback(). A retrieval call should not own the caller's transaction lifecycle.

  3. Redundant wrapper. retrieve_relevant_facts (lines 123-174) is a thin pass-through that just calls retrieve_relevant_facts_cached with the same arguments. Two public methods with identical semantics is a maintenance trap — callers can't tell if there's a difference.

  4. Threshold default mismatch. The function default is relevance_threshold=0.7 (strict). RecallTool overrides to 0.3 (permissive) every call. The "right" default isn't obvious without data — currently it depends on which entry point the caller used. Should be one value, chosen by measurement, baked in.

  5. Two cache layers, mismatched TTLs. Embedding cache is 5-minute TTL (line ~383); fact-result cache is 1-hour (line ~334). A user repeating the same question 6 minutes later re-embeds but reuses fact-result cache (different cache key paths). The TTL mismatch creates non-obvious behaviour without an obvious benefit.

  6. Fallback query fires automatically. If primary returns empty, lines ~244-256 immediately retry with a lower threshold. That's two pgvector round-trips for empty results. A single relaxed-threshold query, or trusting the caller's threshold, would be simpler.

  7. Reranker feature flag via env var with cache mismatch risk. _is_reranker_enabled() is checked on each call (line ~222). If toggled mid-process, cached pre-rerank results may be returned to a post-rerank caller (or vice versa) since the rerank state is not in the cache key.

  8. Durability decay computed in raw SQL string. _get_durability_decay_sql() (lines ~404-418) returns a string concatenated into the query. Hardcoded half-lives (7d / 90d / 365d) live inside the string. Hard to test, hard to vary by experiment, easy to typo into a silent regression.

  9. Verbose per-call logging. Every call logs cache hit/miss, retrieved-facts count, and per-fact scores at INFO. Acceptable on the planner path (low cadence). On the proposed voice-v3 reactive recall path it would log every user turn — log volume worth quantifying before we add another caller.

Impact

  • Latency: read-side commit (issue 2) couples read latency to write latency. On voice's tight per-turn budget, an unexpected db.commit() blocking the response is hard to predict and hard to debug. Dead Prometheus increments (issue 1) are negligible individually but show up at voice cadence.
  • Correctness: silent transaction commits/rollbacks (issue 2) can clobber unrelated in-flight writes in the caller's session.
  • Maintainability: redundant wrapper (issue 3), threshold mismatch (issue 4), and SQL-as-string decay (issue 8) make changes risky and hard to reason about.
  • Observability: dead metrics (issue 1) are misleading — they look like the service is observable when it isn't.

Proposed Resolution

Single small refactor PR, before adding the voice-v3 reactive recall ear-tool:

  1. Delete the unused Prometheus metric imports and increments. If we want metrics later, add them through whatever observability stack we actually run.
  2. Move last_used_at bump out of the retrieval function — return the fact ids and let a higher-level caller decide when (and on which session) to commit. Or do it in a separate background task. Either way, drop the db.commit() / db.rollback() from the read path.
  3. Delete retrieve_relevant_facts. Keep only retrieve_relevant_facts_cached (renaming to retrieve_relevant_facts if the suffix feels clinical).
  4. Pick one default relevance_threshold based on a quick eyeball of cache hit data; remove the per-call override at RecallTool.
  5. Decide: collapse fallback into a single permissive query OR keep two-stage but document why.
  6. Move durability half-lives to a typed config object; build the decay expression with SQLAlchemy primitives, not string concatenation.
  7. Demote per-call retrieval logs to DEBUG; emit a single INFO summary line per call with counts only.

Priority

Medium. Not blocking voice-v3 ear-tools — the reactive recall path can ship on top of the current code. But the read/write coupling (issue 2) is a real footgun and the dead metrics (issue 1) are quick wins. Worth a half-day cleanup pass before this file ends up on yet another hot path.

Files Affected

  • apps/backend/swisper/services/fact_retrieval.py — primary cleanup
  • apps/backend/swisper/agents/memory/tools/recall.py — drop threshold override, adjust to single retrieval entry point
  • Any other caller of retrieve_relevant_facts (the legacy non-cached entry point) — update import