Skip to content

Background Jobs — Overview

Audience: Business stakeholders, product owners, analysts, new team members.


What This Module Does

Background jobs are tasks that Swisper runs periodically without any user interaction. They keep the system current by fetching new emails, syncing calendar events, classifying messages, and sending proactive notifications. Without background jobs, Swisper would only know about information that users explicitly provide during conversations.

The system currently includes 11 jobs across three categories:

Data Ingestion

Job What It Does Typical Schedule
Ingest Emails Fetches new emails from Gmail and Office 365 using delta sync, creates embeddings for search Every few minutes
Ingest Calendar Events Fetches calendar events, stores attendees and embeddings Every few minutes
Classify Emails Uses an LLM to classify unprocessed emails: summary, action items, importance, labels Every 6 hours

Proactive Notifications

Job What It Does Typical Schedule
Daily Briefing Generates a personalized morning briefing for users in the 7:00–8:59 AM local time window Hourly
Important Email Notifications Alerts users about high-importance emails and approaching deadlines Every few minutes
Pre-Meeting Prep Sends meeting preparation summaries before upcoming events (30-minute lookahead) Every 10–30 minutes
Commitment Reminders Reminds users about commitments with deadlines in the next 24 hours Every 1–2 hours
Awaiting Response Notifications Notifies users about sent emails that have not received a reply Every 1–2 hours

System Maintenance

Job What It Does Typical Schedule
Fact Decay Reduces computed_relevance of aging facts to keep the memory system fresh Periodic
Redis Expiration Monitoring Checks Redis keys nearing expiry and backs up critical state to PostgreSQL Periodic
Threema Polling Polls the Threema Gateway for pending registration tokens to activate new integrations Every 3 seconds

Who It Serves

Persona Need
End users Fresh email/calendar data in conversations, timely proactive notifications, and relevant long-term memory
Operations Understanding which jobs run, how often, and what they depend on for scheduling and monitoring
Backend developers Adding new jobs following the BaseJob pattern, understanding the job lifecycle and correlation tracing

Key Capabilities

  • Unified job interface — All jobs extend BaseJob, which provides automatic correlation ID generation, structured logging with timing, and error capture. Developers implement a single execute() method.
  • CLI dispatch — Jobs are invoked via python -m swisper.jobs.main <job_name>, making them easy to run in Docker, Kubernetes CronJobs, or local development.
  • External scheduling — The system does not include an internal scheduler. Jobs are triggered by external cron or orchestration tools, which allows flexible scheduling without code changes.
  • Correlation ID tracing — Each job run generates a unique job-{uuid} correlation ID that propagates through all logs, LLM calls, and database operations for that run. This enables end-to-end tracing of a single job execution.
  • Token tracking integration — Data ingestion jobs initialize token tracking at the start and persist usage to the background_job_token_usages table on completion, enabling cost attribution per job type.
  • LLM-powered notifications — Notification jobs use LLMs to personalize message content based on user context, facts, and preferences. Prompt templates are stored as markdown files in swisper/jobs/prompts/.

How It Fits in the Platform

  • Integrations — Email and calendar ingestion jobs use OAuth tokens stored by the Integrations module to access Gmail and Office 365 APIs.
  • Signals & Notifications — All notification jobs deliver their messages through SignalsService, which handles channel selection and delivery.
  • Fact System — The fact decay job maintains the long-term memory system by reducing the relevance of aging facts.
  • Token Analytics — Job token usage is tracked separately in background_job_token_usages and counts toward the user's rate limit.
  • LLM Adapter — Jobs use SwisperLLMAdapter for embeddings and classification, respecting the per-node provider configuration.

Limits and Edge Cases

  • No internal scheduler. Jobs rely on external scheduling (cron, Kubernetes CronJobs). If the scheduler fails, jobs do not run and there is no built-in retry or dead-letter mechanism.
  • Single-instance assumption. Jobs are not designed for concurrent execution of the same job type. Running two instances of ingest_emails simultaneously could cause duplicate processing.
  • Token refresh failures. Email/calendar ingestion jobs depend on valid OAuth tokens. If a token refresh fails (revoked access), the job fails for that user and moves on to the next.
  • Notification timezone dependency. The daily briefing job uses the user's timezone to determine the delivery window. Users without a configured timezone may receive briefings at unexpected times.

FAQ

Q: How do I add a new background job? A: Create a class extending BaseJob, implement the execute() method, add a runner function, and register it in the JOB_MAP in job_registry.py. The job can then be invoked via python -m swisper.jobs.main <your_job_name>.

Q: How are jobs scheduled? A: Externally. In production, Kubernetes CronJobs or similar orchestration tools trigger python -m swisper.jobs.main <job_name> on the desired schedule. There is no internal scheduler.

Q: Can I run a job locally? A: Yes. After initializing your environment, run python -m swisper.jobs.main <job_name> directly. The CLI handles configuration and LLM provider initialization.

Q: What happens if a job fails? A: The BaseJob.run() method catches exceptions, logs the error with the correlation ID and duration, and returns. The external scheduler is responsible for retries.