Rate Limiting — Overview¶
Audience: Business stakeholders, product owners, analysts, new team members.
What This Module Does¶
Rate limiting prevents abuse and controls costs by restricting how frequently users can make requests and how many LLM tokens they can consume. Swisper implements three independent rate limiting layers, each protecting against a different threat:
| Layer | What It Protects Against | Storage |
|---|---|---|
| Auth Endpoint | Rapid-fire API requests from authenticated users (DoS, scripting) | Redis |
| Token Budget | Excessive LLM consumption by a single user (cost control) | PostgreSQL |
| Login | Brute-force authentication attempts | Redis |
All three layers use a sliding window algorithm and fail open — if the storage backend (Redis or PostgreSQL) is unavailable, requests are allowed through to avoid blocking legitimate users during infrastructure issues.
Who It Serves¶
| Persona | Need |
|---|---|
| End users | Protection from rate limiting under normal usage. Clear feedback (HTTP 429 with Retry-After) when limits are hit |
| Platform administrators | Configurable limits per layer, ability to adjust token budgets without code changes |
| Security team | Brute-force protection on login, per-IP rate limiting on API endpoints |
| Finance | Per-user token budgets that cap LLM cost exposure |
Key Capabilities¶
- Three independent layers — Each layer operates independently with its own storage, configuration, and failure mode. A request can pass endpoint rate limiting but be blocked by token rate limiting, or vice versa.
- Dual key tracking — Both the auth endpoint limiter and the login limiter track requests by two keys simultaneously (user ID + IP address, or email + IP address). Both limits must pass for the request to proceed.
- Configurable token budgets — Token rate limits are stored in the database (
configurationtable) and can be updated at runtime without redeployment. The default is 1,500,000 tokens per 3-hour window. - Burst allowance — The token rate limiter allows a 10% burst above the configured limit (e.g., 1,650,000 tokens on a 1,500,000 limit) to avoid penalizing users who are slightly over budget.
- Fail-open design — All limiters catch storage errors and allow the request through. This prioritizes availability over strict enforcement.
- Rate limit status in responses — The streaming response and session refresh endpoints include rate limit status, allowing the frontend to show remaining budget to the user.
How It Fits in the Platform¶
- Auth Middleware — The endpoint rate limiter is applied as FastAPI middleware to all authenticated requests before they reach the route handler.
- Chat & Streaming — The token rate limiter is checked when creating a chat and posting a message. Rate limit status is included in the final streaming chunk.
- Login — The login rate limiter is checked before validating credentials. Successful login clears the email-based counter.
- Token Analytics — The token rate limiter queries the same
token_usagesandbackground_job_token_usagestables used by the analytics module.
Default Limits¶
| Layer | Dimension | Limit | Window |
|---|---|---|---|
| Auth Endpoint | Per user | 100 requests | 20 seconds |
| Auth Endpoint | Per IP | 80 requests | 20 seconds |
| Token Budget | Per user | 1,500,000 tokens (+ 10% burst) | 3 hours |
| Login | Per email | 5 attempts | 15 minutes |
| Login | Per IP | 30 attempts | 15 minutes |
Limits and Edge Cases¶
- No per-endpoint differentiation. The auth endpoint rate limiter uses a single
authendpoint key for all authenticated routes. A user making many lightweight GET requests counts the same as one making expensive POST requests. - Token budget includes background jobs. If a user's background jobs (email ingestion, classification) consume many tokens, it reduces their interactive token budget for the same window.
- Login rate limit clears on success. A successful login clears the email-based counter but not the IP-based counter. An attacker cycling through emails from the same IP would still be rate limited.
FAQ¶
Q: What happens when a user hits a rate limit?
A: They receive an HTTP 429 response with a Retry-After header indicating when they can try again.
Q: Can I change the token budget without redeploying?
A: Yes. The token rate limit parameters (TOKEN_LIMIT_MAX_TOKENS, TOKEN_LIMIT_WINDOW_HOURS, TOKEN_LIMIT_BURST_ALLOWANCE) are stored in the configuration table and read at runtime.
Q: What does "fail open" mean? A: If Redis or PostgreSQL is temporarily unavailable, the rate limiter allows the request through instead of blocking it. This avoids outages caused by the rate limiting infrastructure itself.
Q: Does rate limiting apply to admin/superuser requests? A: The auth endpoint limiter applies to all authenticated requests. The token rate limiter is checked for all users when creating chats and posting messages.