Token Usage & Analytics — Overview¶

Audience: Business stakeholders, product owners, analysts, new team members.

What This Module Does¶

Every time Swisper calls an LLM — whether to classify an intent, generate a response, extract facts, or embed a document — tokens are consumed. The Token Usage & Analytics module tracks this consumption at a granular level, attributing it to specific graph nodes, users, and conversations. This enables cost monitoring, per-user usage reporting, and identification of which parts of the pipeline consume the most tokens.

The system works in two phases:

During a request — Token counts are accumulated in a Redis hash, updated after each LLM call. This avoids database writes during the critical path.
After the request — The accumulated data is flushed to PostgreSQL and the Redis key is deleted. PostgreSQL serves as the permanent store for analytics queries.

Who It Serves¶

Persona	Need
Platform administrators	Monitoring total LLM spend, identifying high-usage users, and detecting unexpected cost spikes
Product owners	Understanding which features (nodes) consume the most tokens to inform optimization priorities
Backend developers	Debugging token consumption for specific requests or jobs using the correlation ID
Finance	Aggregated usage reports for cost allocation

Key Capabilities¶

Per-call recording — Each LLM invocation records: category (structured_output, streaming, embedding), total tokens, prompt tokens, completion tokens, and the agent/node type that made the call.
Node-type breakdown — A JSONB field stores per-node token usage (e.g., how many tokens intent_classification vs. user_interface vs. fact_extraction consumed in a single request).
Separate tracking for background jobs — Background jobs (email ingestion, calendar sync, etc.) persist to a separate background_job_token_usages table with a job_type field and llm_type_breakdown.
Two-tier storage — Redis during the request lifecycle (TTL 1 hour) for fast incremental updates; PostgreSQL for permanent storage and analytics queries.
Admin analytics API — Six endpoints under /api/v1/analytics/ provide per-user summaries, paginated user lists, and aggregated breakdowns by node type and LLM type. All endpoints are superuser-only.
JSONB aggregations — Analytics queries use PostgreSQL JSONB functions to aggregate token usage across the node_type_breakdown and llm_type_breakdown fields.

How It Fits in the Platform¶

LLM Adapter — SwisperLLMAdapter and legacy TokenTrackingLLMAdapter call record_llm_usage() after every LLM invocation (structured output, streaming, embedding).
Session Init — session_init node calls initialize_tracking() to create the Redis hash at the start of each conversation turn.
Streaming — The streaming response handler calls persist_to_postgres() after the final chunk is sent, flushing Redis data to PostgreSQL.
Background Jobs — ingest_emails_job and ingest_calendar_events_job initialize tracking per user and call persist_background_job_to_postgres() on completion.
Rate Limiting — The token rate limiter queries the token_usages and background_job_token_usages tables to enforce per-user token limits over a sliding window.

Limits and Edge Cases¶

No model or cost tracking per call. The system records token counts and node types but does not record the specific LLM model used or the dollar cost per call. Cost must be derived by cross-referencing with the llm_node_configuration table.
Redis TTL expiry. If a request takes longer than 1 hour (the Redis key TTL), the accumulated data may be lost before persistence. This is unlikely for interactive requests but theoretically possible for very long-running jobs.
Correlation ID linkage. Each tracking session is linked via correlation_id, which connects the Redis hash, the PostgreSQL row, and the application logs. If the correlation ID is not set (e.g., a code path that bypasses initialize_tracking()), usage is not tracked.

FAQ¶

Q: Who can access the analytics API? A: Only superusers. All endpoints under /api/v1/analytics/ require superuser authentication.

Q: How do I find how many tokens a specific conversation used? A: Query the token_usages table by chat_id to get per-request usage within that conversation.

Q: Does background job token usage count toward a user's rate limit? A: Yes. The token rate limiter sums tokens from both token_usages and background_job_token_usages for the user within the time window.

Q: How granular is the node-type breakdown? A: Per request. Each row in token_usages has a node_type_breakdown JSONB field mapping node names to {total, prompt, completion} token counts. The analytics API aggregates these across requests.