Skip to content

Greeting System — Architecture

Audience: Architects, tech leads, senior engineers evaluating design decisions and cross-module impact. This document answers "how is this module designed, and why?" Assumes technical fluency but explains domain-specific decisions.

This content was migrated from Documentation/GREETING_SYSTEM.md and restructured into audience sections. Review for accuracy against the current codebase.


Context and Purpose

The Greeting System exists as a separate, fast-path module outside the main LangGraph pipeline. Normal chat messages traverse the full Global Supervisor graph (intent classification → entity resolution → retrieval → planning → UI response), which takes 3–8 seconds. Greetings must appear instantly when the user opens the app — the latency budget is under 2 seconds to first token. This drives the key architectural decision: greetings bypass the orchestration graph entirely and call the greeting node directly from a dedicated API endpoint.

The second major architectural concern is fact selection intelligence. The system must pick the most relevant, timely, and engaging facts from potentially dozens of stored facts, while avoiding repetition across sessions. This is handled by a priority scoring formula with four configurable components, managed via a runtime-editable database configuration table.


Architecture Overview

graph TD
    subgraph Frontend ["Frontend"]
        OPEN["User Opens App"]
        SSE["SSE Stream\nDisplay"]
    end

    subgraph FrequencyGate ["Frequency Gate"]
        FG{"Last greeting\n>= 4 hours ago?"}
        DEFAULT["Return default\ngreeting"]
    end

    subgraph FactLoading ["Fact Preloading Pipeline"]
        FPS["Fact Preloading\nService"]
        SCORE["Priority Scoring\n(0-100 points)"]
        WARMTH["Warmth Fact\nSelection"]
        REDIS["Redis Cache\n(1h TTL)"]
    end

    subgraph Generation ["Greeting Generation"]
        LANG["Language\nDetection"]
        VARIANT{"Facts\navailable?"}
        FULL["greeting.md\n(personalized)"]
        SIMPLE["greeting_simple.md\n(generic)"]
        LLM["LLM Stream\n(greeting agent_type)"]
    end

    subgraph Config ["Configuration"]
        DB["fact_preloading_config\n(PostgreSQL JSONB)"]
        ADMIN["Admin API\n(PATCH endpoint)"]
    end

    OPEN -->|"POST /api/v1/chats/greeting"| FG
    FG -->|"No (< 4h)"| DEFAULT
    FG -->|"Yes"| FPS
    FPS --> SCORE
    SCORE --> WARMTH
    WARMTH --> REDIS
    REDIS --> LANG
    LANG --> VARIANT
    VARIANT -->|"1+ facts"| FULL
    VARIANT -->|"0 facts"| SIMPLE
    FULL --> LLM
    SIMPLE --> LLM
    LLM -->|"SSE chunks"| SSE
    DB -.->|"scoring params"| SCORE
    ADMIN -.->|"runtime updates"| DB

    DEFAULT --> SSE

The flow is: frontend triggers POST /api/v1/chats/greeting → frequency gate checks if enough time has passed → Fact Preloading Service queries facts from PostgreSQL and applies priority scoring using parameters from the fact_preloading_config table → warmth facts are appended → results cached in Redis → language detected → prompt variant selected → LLM streams greeting via SSE. The greeting time is recorded after streaming completes (to handle React StrictMode double-invocation).


Component Responsibilities

Component Responsibility
Greeting Node (greeting.py) Orchestrates the greeting flow: reads preloaded facts from state, determines language, selects prompt variant, calls LLM, yields streamed chunks.
Greeting Builder (greeting_builder.py) Loads .md prompt templates, injects placeholders (name, time of day, facts, language), formats fact metadata with temporal position (PAST/TODAY/UPCOMING), marks used facts for rotation.
Greeting Frequency (greeting_frequency.py) Frequency gate: checks avatars.last_greeting_time against configurable thresholds (GREETING_MIN_HOURS_GAP, GREETING_MIN_DAYS_GAP). Records greeting time after streaming.
Fact Preloading Service (fact_preloading.py) Loads top facts using the priority scoring formula. Queries user_facts table, applies time urgency + type priority + confidence + recency malus scoring. Caches results in Redis (1h TTL).
Fact Preloading Config Service (fact_preloading_config.py) Manages the fact_preloading_config database table. Provides 60-second in-memory caching. All scoring parameters are read from this table.
Prompt Templates (prompts/greeting*.md) Three variants: greeting.md (personalized with facts), greeting_simple.md (warm generic), greeting_voice.md (voice mode). Use {{placeholder}} injection.
Admin Fact Config API (admin/fact_config.py) PATCH endpoint for runtime configuration updates. Changes take effect within 60 seconds (cache TTL).

The Fact Scoring System

The scoring formula is the architectural core of the Greeting System. Every fact is scored using four components, all configurable at runtime:

PRIORITY_SCORE = TIME_URGENCY + FACT_TYPE_PRIORITY + CONFIDENCE_SCORE + RECENCY_MALUS
                  (0–50 pts)     (0–30 pts)           (0–20 pts)        (-60 to 0 pts)

Time Urgency (0–50 points)

Scores facts based on temporal proximity. Computed from the fact's time_anchor date relative to today.

Time Window Config Key Default (days) Score Config Key Default (pts)
Imminent future time_window_imminent_future 3 urgency_score_imminent_future 50
Imminent past time_window_imminent_past 3 urgency_score_imminent_past 45
Near future time_window_near_future 7 urgency_score_near_future 40
Near past time_window_near_past 7 urgency_score_near_past 30
Recently created urgency_score_recent_creation 20
Life change (90d) time_window_life_change 90 urgency_score_life_change 15
Stable (no date) urgency_score_stable 10

Fact Type Priority (0–30 points)

Each of the 12 fact types has a configurable priority score:

Fact Type Config Key Default (pts)
Schedule fact_type_priority_Schedule 30
Travel fact_type_priority_Travel 28
Milestone fact_type_priority_Milestone 26
Health fact_type_priority_Health 20
Relationship fact_type_priority_Relationship 18
Pet fact_type_priority_Pet 18
Work fact_type_priority_Work 14
Hobby fact_type_priority_Hobby 14
Learning fact_type_priority_Learning 12
Preference fact_type_priority_Preference 6
Other fact_type_priority_Other 5
Profile fact_type_priority_Profile 4
(unknown) fact_type_priority_default 5

Confidence Score (0–20 points)

CONFIDENCE_SCORE = fact.confidence × priority_weight_confidence_max

Facts below min_confidence (default: 0.7) are excluded entirely.

Recency Malus (Rotation)

A penalty applied to recently-used facts, ensuring greeting variety:

Days Since Last Used Config Key Default (pts)
Same day recency_malus_day_1 -60
1 day ago recency_malus_day_2 -50
2 days ago recency_malus_day_3 -40
3 days ago recency_malus_day_4 -30
4 days ago recency_malus_day_5 -20
5 days ago recency_malus_day_6 -10
6+ days ago 0

When a greeting is generated, all included fact IDs are marked with last_used_at = now via a fire-and-forget background task.

Warmth Facts

After selecting the top priority-scored facts (default: 3), the system adds up to 1 warmth fact — a stable personal detail from configurable types (warmth_types, default: ["Pet", "Hobby", "Relationship"]). Warmth facts must not have a time_anchor (they're stable, not events) and have their own recency malus applied.


Data Model

Structure Contents Lifecycle
fact_preloading_config (PostgreSQL) Single-row JSONB table with all scoring parameters Persistent. Initialized by Alembic migration. Updated via Admin API. Read by Fact Preloading Service with 60s cache.
user_facts (PostgreSQL) Stored facts with type, text, confidence, time_anchor, last_used_at, last_mentioned_at Persistent. Queried by Fact Preloading Service. last_used_at updated after each greeting for rotation.
avatars.last_greeting_time (PostgreSQL) Timestamp of last personalized greeting per avatar Persistent. Read by frequency gate. Written after greeting stream completes.
Redis cache (preloaded:facts:{user_id}:{avatar_id}) Preloaded facts JSON with 1-hour TTL Cached. Warmed during greeting so main chat pipeline has facts ready.
state["memory_domain"]["preloaded_facts"] Facts injected into greeting node state In-flight. Direct injection for zero-latency access during greeting generation.

Key Design Decisions

Decision 1: Bypass LangGraph — direct node execution

  • Chosen: Greeting triggers call greeting_node() directly from the API endpoint, bypassing the full Global Supervisor pipeline.
  • Rejected: Running greetings through the standard LangGraph orchestration graph.
  • Rationale: Greetings are single-turn with no user message, no intent to classify, and no agents to invoke. The full pipeline adds 3–5 seconds of overhead. Direct execution achieves <2 second time-to-first-token.

Decision 2: Dual fact injection — state + Redis

  • Chosen: Facts are loaded into both the greeting node's state (for immediate use) and Redis (for cache warming).
  • Rejected: State-only or Redis-only approaches.
  • Rationale: The greeting node needs facts immediately (state injection). When the user replies to the greeting, the main chat pipeline needs those same facts without re-querying (Redis warm cache). Dual injection eliminates both latency and race conditions.

Decision 3: Database-driven scoring configuration

  • Chosen: All scoring parameters stored in a single-row JSONB table (fact_preloading_config), editable via Admin API at runtime.
  • Rejected: Hardcoded constants, environment variables, or YAML config files.
  • Rationale: Scoring parameters need frequent tuning (e.g., adjusting how aggressively the recency malus suppresses facts). A database-backed config with 60-second cache allows product owners to experiment without deployments. The Admin API provides a safe, auditable change mechanism.

Decision 4: Record greeting time after streaming

  • Chosen: last_greeting_time is written to the database after the SSE stream completes, not before.
  • Rejected: Recording before streaming begins.
  • Rationale: React StrictMode can trigger double-invocation of the greeting endpoint. If the time were recorded before streaming, the second invocation would hit the frequency gate and return a blank greeting. Recording after ensures at least one complete greeting is delivered.

Interfaces and Contracts

Interface Direction Consumer Contract
POST /api/v1/chats/greeting Inbound Frontend Returns SSE stream of greeting chunks. Requires auth token.
state["memory_domain"]["preloaded_facts"] Inbound Greeting endpoint (from Fact Preloading Service) Dict with facts list (each fact has text, type, priority_score, time_anchor, fact_id, is_warmth_fact)
PATCH /api/v1/admin/config/fact-preloading Inbound Admin API JSON body with config key/value pairs to update
Redis preloaded:facts:{user_id}:{avatar_id} Outbound Main chat pipeline (context loader) JSON with preloaded facts, 1-hour TTL
avatars.last_greeting_time Internal Frequency gate UTC timestamp, read on greeting request, written after streaming
user_facts.last_used_at Internal Recency malus computation UTC timestamp, updated via fire-and-forget background task after greeting

Known Trade-offs and Debt

  • No conversation context in greetings: The greeting node has no access to previous chat history. It can only reference stored facts, not ongoing conversations. Adding conversation-aware greetings would require loading chat summaries, increasing latency.
  • Frequency gate is avatar-scoped, not device-scoped: If a user opens Swisper on their phone and laptop within 4 hours, only the first device gets the personalized greeting. The second sees the default. This is acceptable for single-device usage patterns.
  • Warmth fact selection is random within type: When multiple warmth facts exist, the selection is not deterministic. A more sophisticated approach could match warmth facts to the priority facts being mentioned (e.g., connect a pet fact to a health fact).
  • No A/B testing for prompt variants: There's no mechanism to test different greeting styles or fact presentation formats on subsets of users.