Token Usage & Analytics — Overview¶
Audience: Business stakeholders, product owners, analysts, new team members.
What This Module Does¶
Every time Swisper calls an LLM — whether to classify an intent, generate a response, extract facts, or embed a document — tokens are consumed. The Token Usage & Analytics module tracks this consumption at a granular level, attributing it to specific graph nodes, users, and conversations. This enables cost monitoring, per-user usage reporting, and identification of which parts of the pipeline consume the most tokens.
The system works in two phases:
- During a request — Token counts are accumulated in a Redis hash, updated after each LLM call. This avoids database writes during the critical path.
- After the request — The accumulated data is flushed to PostgreSQL and the Redis key is deleted. PostgreSQL serves as the permanent store for analytics queries.
Who It Serves¶
| Persona | Need |
|---|---|
| Platform administrators | Monitoring total LLM spend, identifying high-usage users, and detecting unexpected cost spikes |
| Product owners | Understanding which features (nodes) consume the most tokens to inform optimization priorities |
| Backend developers | Debugging token consumption for specific requests or jobs using the correlation ID |
| Finance | Aggregated usage reports for cost allocation |
Key Capabilities¶
- Per-call recording — Each LLM invocation records: category (
structured_output,streaming,embedding), total tokens, prompt tokens, completion tokens, and the agent/node type that made the call. - Node-type breakdown — A JSONB field stores per-node token usage (e.g., how many tokens
intent_classificationvs.user_interfacevs.fact_extractionconsumed in a single request). - Separate tracking for background jobs — Background jobs (email ingestion, calendar sync, etc.) persist to a separate
background_job_token_usagestable with ajob_typefield andllm_type_breakdown. - Two-tier storage — Redis during the request lifecycle (TTL 1 hour) for fast incremental updates; PostgreSQL for permanent storage and analytics queries.
- Admin analytics API — Six endpoints under
/api/v1/analytics/provide per-user summaries, paginated user lists, and aggregated breakdowns by node type and LLM type. All endpoints are superuser-only. - JSONB aggregations — Analytics queries use PostgreSQL JSONB functions to aggregate token usage across the
node_type_breakdownandllm_type_breakdownfields.
How It Fits in the Platform¶
- LLM Adapter —
SwisperLLMAdapterand legacyTokenTrackingLLMAdaptercallrecord_llm_usage()after every LLM invocation (structured output, streaming, embedding). - Session Init —
session_initnode callsinitialize_tracking()to create the Redis hash at the start of each conversation turn. - Streaming — The streaming response handler calls
persist_to_postgres()after the final chunk is sent, flushing Redis data to PostgreSQL. - Background Jobs —
ingest_emails_jobandingest_calendar_events_jobinitialize tracking per user and callpersist_background_job_to_postgres()on completion. - Rate Limiting — The token rate limiter queries the
token_usagesandbackground_job_token_usagestables to enforce per-user token limits over a sliding window.
Limits and Edge Cases¶
- No model or cost tracking per call. The system records token counts and node types but does not record the specific LLM model used or the dollar cost per call. Cost must be derived by cross-referencing with the
llm_node_configurationtable. - Redis TTL expiry. If a request takes longer than 1 hour (the Redis key TTL), the accumulated data may be lost before persistence. This is unlikely for interactive requests but theoretically possible for very long-running jobs.
- Correlation ID linkage. Each tracking session is linked via
correlation_id, which connects the Redis hash, the PostgreSQL row, and the application logs. If the correlation ID is not set (e.g., a code path that bypassesinitialize_tracking()), usage is not tracked.
FAQ¶
Q: Who can access the analytics API?
A: Only superusers. All endpoints under /api/v1/analytics/ require superuser authentication.
Q: How do I find how many tokens a specific conversation used?
A: Query the token_usages table by chat_id to get per-request usage within that conversation.
Q: Does background job token usage count toward a user's rate limit?
A: Yes. The token rate limiter sums tokens from both token_usages and background_job_token_usages for the user within the time window.
Q: How granular is the node-type breakdown?
A: Per request. Each row in token_usages has a node_type_breakdown JSONB field mapping node names to {total, prompt, completion} token counts. The analytics API aggregates these across requests.