Background Jobs — Overview¶

Audience: Business stakeholders, product owners, analysts, new team members.

What This Module Does¶

Background jobs are tasks that Swisper runs periodically without any user interaction. They keep the system current by fetching new emails, syncing calendar events, classifying messages, and sending proactive notifications. Without background jobs, Swisper would only know about information that users explicitly provide during conversations.

The system currently includes 11 jobs across three categories:

Data Ingestion¶

Job	What It Does	Typical Schedule
Ingest Emails	Fetches new emails from Gmail and Office 365 using delta sync, creates embeddings for search	Every few minutes
Ingest Calendar Events	Fetches calendar events, stores attendees and embeddings	Every few minutes
Classify Emails	Uses an LLM to classify unprocessed emails: summary, action items, importance, labels	Every 6 hours

Proactive Notifications¶

Job	What It Does	Typical Schedule
Daily Briefing	Generates a personalized morning briefing for users in the 7:00–8:59 AM local time window	Hourly
Important Email Notifications	Alerts users about high-importance emails and approaching deadlines	Every few minutes
Pre-Meeting Prep	Sends meeting preparation summaries before upcoming events (30-minute lookahead)	Every 10–30 minutes
Commitment Reminders	Reminds users about commitments with deadlines in the next 24 hours	Every 1–2 hours
Awaiting Response Notifications	Notifies users about sent emails that have not received a reply	Every 1–2 hours

System Maintenance¶

Job	What It Does	Typical Schedule
Fact Decay	Reduces `computed_relevance` of aging facts to keep the memory system fresh	Periodic
Redis Expiration Monitoring	Checks Redis keys nearing expiry and backs up critical state to PostgreSQL	Periodic
Threema Polling	Polls the Threema Gateway for pending registration tokens to activate new integrations	Every 3 seconds

Who It Serves¶

Persona	Need
End users	Fresh email/calendar data in conversations, timely proactive notifications, and relevant long-term memory
Operations	Understanding which jobs run, how often, and what they depend on for scheduling and monitoring
Backend developers	Adding new jobs following the BaseJob pattern, understanding the job lifecycle and correlation tracing

Key Capabilities¶

Unified job interface — All jobs extend BaseJob, which provides automatic correlation ID generation, structured logging with timing, and error capture. Developers implement a single execute() method.
CLI dispatch — Jobs are invoked via python -m swisper.jobs.main <job_name>, making them easy to run in Docker, Kubernetes CronJobs, or local development.
External scheduling — The system does not include an internal scheduler. Jobs are triggered by external cron or orchestration tools, which allows flexible scheduling without code changes.
Correlation ID tracing — Each job run generates a unique job-{uuid} correlation ID that propagates through all logs, LLM calls, and database operations for that run. This enables end-to-end tracing of a single job execution.
Token tracking integration — Data ingestion jobs initialize token tracking at the start and persist usage to the background_job_token_usages table on completion, enabling cost attribution per job type.
LLM-powered notifications — Notification jobs use LLMs to personalize message content based on user context, facts, and preferences. Prompt templates are stored as markdown files in swisper/jobs/prompts/.

How It Fits in the Platform¶

Integrations — Email and calendar ingestion jobs use OAuth tokens stored by the Integrations module to access Gmail and Office 365 APIs.
Signals & Notifications — All notification jobs deliver their messages through SignalsService, which handles channel selection and delivery.
Fact System — The fact decay job maintains the long-term memory system by reducing the relevance of aging facts.
Token Analytics — Job token usage is tracked separately in background_job_token_usages and counts toward the user's rate limit.
LLM Adapter — Jobs use SwisperLLMAdapter for embeddings and classification, respecting the per-node provider configuration.

Limits and Edge Cases¶

No internal scheduler. Jobs rely on external scheduling (cron, Kubernetes CronJobs). If the scheduler fails, jobs do not run and there is no built-in retry or dead-letter mechanism.
Single-instance assumption. Jobs are not designed for concurrent execution of the same job type. Running two instances of ingest_emails simultaneously could cause duplicate processing.
Token refresh failures. Email/calendar ingestion jobs depend on valid OAuth tokens. If a token refresh fails (revoked access), the job fails for that user and moves on to the next.
Notification timezone dependency. The daily briefing job uses the user's timezone to determine the delivery window. Users without a configured timezone may receive briefings at unexpected times.

FAQ¶

Q: How do I add a new background job? A: Create a class extending BaseJob, implement the execute() method, add a runner function, and register it in the JOB_MAP in job_registry.py. The job can then be invoked via python -m swisper.jobs.main <your_job_name>.

Q: How are jobs scheduled? A: Externally. In production, Kubernetes CronJobs or similar orchestration tools trigger python -m swisper.jobs.main <job_name> on the desired schedule. There is no internal scheduler.

Q: Can I run a job locally? A: Yes. After initializing your environment, run python -m swisper.jobs.main <job_name> directly. The CLI handles configuration and LLM provider initialization.

Q: What happens if a job fails? A: The BaseJob.run() method catches exceptions, logs the error with the correlation ID and duration, and returns. The external scheduler is responsible for retries.