Background Jobs — Overview¶
Audience: Business stakeholders, product owners, analysts, new team members.
What This Module Does¶
Background jobs are tasks that Swisper runs periodically without any user interaction. They keep the system current by fetching new emails, syncing calendar events, classifying messages, and sending proactive notifications. Without background jobs, Swisper would only know about information that users explicitly provide during conversations.
The system currently includes 11 jobs across three categories:
Data Ingestion¶
| Job | What It Does | Typical Schedule |
|---|---|---|
| Ingest Emails | Fetches new emails from Gmail and Office 365 using delta sync, creates embeddings for search | Every few minutes |
| Ingest Calendar Events | Fetches calendar events, stores attendees and embeddings | Every few minutes |
| Classify Emails | Uses an LLM to classify unprocessed emails: summary, action items, importance, labels | Every 6 hours |
Proactive Notifications¶
| Job | What It Does | Typical Schedule |
|---|---|---|
| Daily Briefing | Generates a personalized morning briefing for users in the 7:00–8:59 AM local time window | Hourly |
| Important Email Notifications | Alerts users about high-importance emails and approaching deadlines | Every few minutes |
| Pre-Meeting Prep | Sends meeting preparation summaries before upcoming events (30-minute lookahead) | Every 10–30 minutes |
| Commitment Reminders | Reminds users about commitments with deadlines in the next 24 hours | Every 1–2 hours |
| Awaiting Response Notifications | Notifies users about sent emails that have not received a reply | Every 1–2 hours |
System Maintenance¶
| Job | What It Does | Typical Schedule |
|---|---|---|
| Fact Decay | Reduces computed_relevance of aging facts to keep the memory system fresh |
Periodic |
| Redis Expiration Monitoring | Checks Redis keys nearing expiry and backs up critical state to PostgreSQL | Periodic |
| Threema Polling | Polls the Threema Gateway for pending registration tokens to activate new integrations | Every 3 seconds |
Who It Serves¶
| Persona | Need |
|---|---|
| End users | Fresh email/calendar data in conversations, timely proactive notifications, and relevant long-term memory |
| Operations | Understanding which jobs run, how often, and what they depend on for scheduling and monitoring |
| Backend developers | Adding new jobs following the BaseJob pattern, understanding the job lifecycle and correlation tracing |
Key Capabilities¶
- Unified job interface — All jobs extend
BaseJob, which provides automatic correlation ID generation, structured logging with timing, and error capture. Developers implement a singleexecute()method. - CLI dispatch — Jobs are invoked via
python -m swisper.jobs.main <job_name>, making them easy to run in Docker, Kubernetes CronJobs, or local development. - External scheduling — The system does not include an internal scheduler. Jobs are triggered by external cron or orchestration tools, which allows flexible scheduling without code changes.
- Correlation ID tracing — Each job run generates a unique
job-{uuid}correlation ID that propagates through all logs, LLM calls, and database operations for that run. This enables end-to-end tracing of a single job execution. - Token tracking integration — Data ingestion jobs initialize token tracking at the start and persist usage to the
background_job_token_usagestable on completion, enabling cost attribution per job type. - LLM-powered notifications — Notification jobs use LLMs to personalize message content based on user context, facts, and preferences. Prompt templates are stored as markdown files in
swisper/jobs/prompts/.
How It Fits in the Platform¶
- Integrations — Email and calendar ingestion jobs use OAuth tokens stored by the Integrations module to access Gmail and Office 365 APIs.
- Signals & Notifications — All notification jobs deliver their messages through
SignalsService, which handles channel selection and delivery. - Fact System — The fact decay job maintains the long-term memory system by reducing the relevance of aging facts.
- Token Analytics — Job token usage is tracked separately in
background_job_token_usagesand counts toward the user's rate limit. - LLM Adapter — Jobs use
SwisperLLMAdapterfor embeddings and classification, respecting the per-node provider configuration.
Limits and Edge Cases¶
- No internal scheduler. Jobs rely on external scheduling (cron, Kubernetes CronJobs). If the scheduler fails, jobs do not run and there is no built-in retry or dead-letter mechanism.
- Single-instance assumption. Jobs are not designed for concurrent execution of the same job type. Running two instances of
ingest_emailssimultaneously could cause duplicate processing. - Token refresh failures. Email/calendar ingestion jobs depend on valid OAuth tokens. If a token refresh fails (revoked access), the job fails for that user and moves on to the next.
- Notification timezone dependency. The daily briefing job uses the user's timezone to determine the delivery window. Users without a configured timezone may receive briefings at unexpected times.
FAQ¶
Q: How do I add a new background job?
A: Create a class extending BaseJob, implement the execute() method, add a runner function, and register it in the JOB_MAP in job_registry.py. The job can then be invoked via python -m swisper.jobs.main <your_job_name>.
Q: How are jobs scheduled?
A: Externally. In production, Kubernetes CronJobs or similar orchestration tools trigger python -m swisper.jobs.main <job_name> on the desired schedule. There is no internal scheduler.
Q: Can I run a job locally?
A: Yes. After initializing your environment, run python -m swisper.jobs.main <job_name> directly. The CLI handles configuration and LLM provider initialization.
Q: What happens if a job fails?
A: The BaseJob.run() method catches exceptions, logs the error with the correlation ID and duration, and returns. The external scheduler is responsible for retries.