ADR-002: Database Separation Strategy¶

Status: ✅ Accepted
Date: 2025-11-01
Deciders: Development Team, Product Owner
Context: Overall Architecture - Data Storage

Context and Problem Statement¶

SwisperStudio needs to store traces, observations, projects, and configuration data. Swisper already has its own database with users, sessions, facts, entities, and application data.

Question: Should SwisperStudio use the same database as Swisper, or have its own separate database?

Decision Drivers¶

Data isolation and security
Independent deployment and scaling
Schema evolution (ability to change schemas independently)
Disaster recovery and backup strategies
Development complexity (one DB is simpler)
Production setup (container separation)

Considered Options¶

Shared Database - SwisperStudio uses Swisper's PostgreSQL instance
Separate Databases - SwisperStudio has its own PostgreSQL instance
Hybrid - Some tables shared, some separate

Decision Outcome¶

Chosen option: "Separate Databases"

Rationale:

SwisperStudio and Swisper are separate services that happen to integrate with each other. They should have clean boundaries and independent lifecycles.

Architecture¶

Production Setup:
├── SwisperStudio Stack
│   ├── swisper_studio_postgres:5433    ← NEW database
│   │   └── Tables: traces, observations, projects
│   └── swisper_studio_backend:8001
│
└── Swisper Stack (existing)
    ├── swisper_postgres:5432           ← EXISTING database
    │   └── Tables: users, sessions, facts, entities, chats
    └── swisper_backend:8000

Communication: API-only (no direct database access)

Swisper Backend ─────HTTP API────► SwisperStudio
                ◄────HTTP API─────

- Swisper sends traces: POST /api/v1/traces
- Swisper gets config: GET /api/v1/config

Positive Consequences¶

✅ Clean separation - Each service owns its data
✅ Independent deployment - Deploy SwisperStudio without touching Swisper
✅ Independent scaling - Scale databases based on different workloads
✅ Security isolation - SwisperStudio has no access to Swisper user data
✅ Schema freedom - Change SwisperStudio schema without affecting Swisper
✅ Disaster recovery - Backup/restore each system independently
✅ Development - Can run SwisperStudio locally without full Swisper stack
✅ Docker clarity - Clear container separation in production

Negative Consequences¶

❌ More infrastructure to manage (two databases)
❌ Can't use SQL joins across systems (must use API calls)
❌ Data synchronization via API (not transactions)
❌ Slightly higher latency (API calls vs. local queries)

Pros and Cons of the Options¶

Option 1: Shared Database¶

Pros: * Simpler infrastructure (one database) * Can use SQL joins * Atomic transactions possible * Lower latency (no API calls)

Cons: * ❌ Tight coupling (schema changes affect both systems) * ❌ Security risk (SwisperStudio has access to user data) * ❌ Deployment coupling (migrate both together) * ❌ Scaling complexity (can't scale independently) * ❌ Unclear ownership (who owns which tables?)

Option 2: Separate Databases ✅ CHOSEN¶

Pros: * ✅ Clean boundaries * ✅ Independent deployment * ✅ Security isolation * ✅ Scalable independently * ✅ Clear ownership

Cons: * More infrastructure complexity * No SQL joins across systems * API latency for data access

Option 3: Hybrid¶

Pros: * Some shared data (e.g., users) * Some isolated data (e.g., traces)

Cons: * ❌ Confusing (which tables are shared?) * ❌ Partial coupling (still have dependencies) * ❌ Worst of both worlds (complexity of both approaches)

Database Contents¶

SwisperStudio Database (`swisper_studio`)¶

Owner: SwisperStudio
Port: 5433 (avoid conflict with Swisper on 5432)

Tables: - projects - Swisper deployment connections - traces - Top-level execution traces - observations - Individual execution steps - prompts - Prompt versions (Phase 3) - config_history - Configuration audit trail (Phase 4) - api_keys - SwisperStudio API keys (Phase 2)

Swisper Database (`swisper`)¶

Owner: Swisper
Port: 5432

Tables: - users - Swisper users - sessions - User sessions - chats - Chat history - facts - User facts - entities - Person, Organization, etc. - llm_node_config - LLM configuration - llm_route - Routing configuration - fact_preloading_config - Preloading settings

Implementation Notes¶

Docker Compose Configuration¶

services:
  # SwisperStudio Database (NEW)
  postgres:
    container_name: swisper_studio_postgres
    ports:
      - "5433:5432"  # External port 5433
    environment:
      POSTGRES_DB: swisper_studio
      POSTGRES_USER: studio_user
      POSTGRES_PASSWORD: studio_pass

  # SwisperStudio Backend
  backend:
    environment:
      DATABASE_URL: postgresql+asyncpg://studio_user:studio_pass@postgres:5432/swisper_studio

API Integration¶

SwisperStudio accesses Swisper data only via API:

# SwisperStudio backend accessing Swisper config
async def get_swisper_config(project: Project, table_name: str):
    """Fetch config from Swisper via API"""
    async with httpx.AsyncClient() as client:
        response = await client.get(
            f"{project.swisper_url}/api/admin/config/{table_name}",
            headers={"Authorization": f"Bearer {project.swisper_api_key}"}
        )
        return response.json()

Links¶

Related: ADR-004: Data-Driven Admin UI
Docker Config: docker-compose.yml
Database Models: backend/app/models/

Validation¶

Success Metrics: - ✅ SwisperStudio and Swisper can deploy independently - ✅ SwisperStudio database backup/restore doesn't affect Swisper - ✅ Can run SwisperStudio locally without Swisper database - ✅ No schema coupling between systems - ✅ Clear container separation in production

Review Date: After Phase 2 (when we have real production load)

Notes¶

Future Consideration: - If we need to query Swisper user data (e.g., for RBAC), we'll do it via API - If we need real-time sync, we can use webhooks (Swisper → SwisperStudio) - Configuration sync happens via SAP (Swisper Admin Protocol) - see ADR-004

Alternative Rejected: - Federation layer (GraphQL or similar) - Too complex for MVP - Database replication - Creates tight coupling, not suitable for separate services