Greeting System — Architecture¶

Audience: Architects, tech leads, senior engineers evaluating design decisions and cross-module impact. This document answers "how is this module designed, and why?" Assumes technical fluency but explains domain-specific decisions.

This content was migrated from Documentation/GREETING_SYSTEM.md and restructured into audience sections. Review for accuracy against the current codebase.

Context and Purpose¶

The Greeting System exists as a separate, fast-path module outside the main LangGraph pipeline. Normal chat messages traverse the full Global Supervisor graph (intent classification → entity resolution → retrieval → planning → UI response), which takes 3–8 seconds. Greetings must appear instantly when the user opens the app — the latency budget is under 2 seconds to first token. This drives the key architectural decision: greetings bypass the orchestration graph entirely and call the greeting node directly from a dedicated API endpoint.

The second major architectural concern is fact selection intelligence. The system must pick the most relevant, timely, and engaging facts from potentially dozens of stored facts, while avoiding repetition across sessions. This is handled by a priority scoring formula with four configurable components, managed via a runtime-editable database configuration table.

Architecture Overview¶

graph TD
    subgraph Frontend ["Frontend"]
        OPEN["User Opens App"]
        SSE["SSE Stream\nDisplay"]
    end

    subgraph FrequencyGate ["Frequency Gate"]
        FG{"Last greeting\n>= 4 hours ago?"}
        DEFAULT["Return default\ngreeting"]
    end

    subgraph FactLoading ["Fact Preloading Pipeline"]
        FPS["Fact Preloading\nService"]
        SCORE["Priority Scoring\n(0-100 points)"]
        WARMTH["Warmth Fact\nSelection"]
        REDIS["Redis Cache\n(1h TTL)"]
    end

    subgraph Generation ["Greeting Generation"]
        LANG["Language\nDetection"]
        VARIANT{"Facts\navailable?"}
        FULL["greeting.md\n(personalized)"]
        SIMPLE["greeting_simple.md\n(generic)"]
        LLM["LLM Stream\n(greeting agent_type)"]
    end

    subgraph Config ["Configuration"]
        DB["fact_preloading_config\n(PostgreSQL JSONB)"]
        ADMIN["Admin API\n(PATCH endpoint)"]
    end

    OPEN -->|"POST /api/v1/chats/greeting"| FG
    FG -->|"No (< 4h)"| DEFAULT
    FG -->|"Yes"| FPS
    FPS --> SCORE
    SCORE --> WARMTH
    WARMTH --> REDIS
    REDIS --> LANG
    LANG --> VARIANT
    VARIANT -->|"1+ facts"| FULL
    VARIANT -->|"0 facts"| SIMPLE
    FULL --> LLM
    SIMPLE --> LLM
    LLM -->|"SSE chunks"| SSE
    DB -.->|"scoring params"| SCORE
    ADMIN -.->|"runtime updates"| DB

    DEFAULT --> SSE

The flow is: frontend triggers POST /api/v1/chats/greeting → frequency gate checks if enough time has passed → Fact Preloading Service queries facts from PostgreSQL and applies priority scoring using parameters from the fact_preloading_config table → warmth facts are appended → results cached in Redis → language detected → prompt variant selected → LLM streams greeting via SSE. The greeting time is recorded after streaming completes (to handle React StrictMode double-invocation).

Component Responsibilities¶

Component	Responsibility
Greeting Node (`greeting.py`)	Orchestrates the greeting flow: reads preloaded facts from state, determines language, selects prompt variant, calls LLM, yields streamed chunks.
Greeting Builder (`greeting_builder.py`)	Loads `.md` prompt templates, injects placeholders (name, time of day, facts, language), formats fact metadata with temporal position (PAST/TODAY/UPCOMING), marks used facts for rotation.
Greeting Frequency (`greeting_frequency.py`)	Frequency gate: checks `avatars.last_greeting_time` against configurable thresholds (`GREETING_MIN_HOURS_GAP`, `GREETING_MIN_DAYS_GAP`). Records greeting time after streaming.
Fact Preloading Service (`fact_preloading.py`)	Loads top facts using the priority scoring formula. Queries `user_facts` table, applies time urgency + type priority + confidence + recency malus scoring. Caches results in Redis (1h TTL).
Fact Preloading Config Service (`fact_preloading_config.py`)	Manages the `fact_preloading_config` database table. Provides 60-second in-memory caching. All scoring parameters are read from this table.
Prompt Templates (`prompts/greeting*.md`)	Three variants: `greeting.md` (personalized with facts), `greeting_simple.md` (warm generic), `greeting_voice.md` (voice mode). Use `{{placeholder}}` injection.
Admin Fact Config API (`admin/fact_config.py`)	PATCH endpoint for runtime configuration updates. Changes take effect within 60 seconds (cache TTL).

The Fact Scoring System¶

The scoring formula is the architectural core of the Greeting System. Every fact is scored using four components, all configurable at runtime:

PRIORITY_SCORE = TIME_URGENCY + FACT_TYPE_PRIORITY + CONFIDENCE_SCORE + RECENCY_MALUS
                  (0–50 pts)     (0–30 pts)           (0–20 pts)        (-60 to 0 pts)

Time Urgency (0–50 points)¶

Scores facts based on temporal proximity. Computed from the fact's time_anchor date relative to today.

Time Window	Config Key	Default (days)	Score Config Key	Default (pts)
Imminent future	`time_window_imminent_future`	3	`urgency_score_imminent_future`	50
Imminent past	`time_window_imminent_past`	3	`urgency_score_imminent_past`	45
Near future	`time_window_near_future`	7	`urgency_score_near_future`	40
Near past	`time_window_near_past`	7	`urgency_score_near_past`	30
Recently created	—	—	`urgency_score_recent_creation`	20
Life change (90d)	`time_window_life_change`	90	`urgency_score_life_change`	15
Stable (no date)	—	—	`urgency_score_stable`	10

Fact Type Priority (0–30 points)¶

Each of the 12 fact types has a configurable priority score:

Fact Type	Config Key	Default (pts)
Schedule	`fact_type_priority_Schedule`	30
Travel	`fact_type_priority_Travel`	28
Milestone	`fact_type_priority_Milestone`	26
Health	`fact_type_priority_Health`	20
Relationship	`fact_type_priority_Relationship`	18
Pet	`fact_type_priority_Pet`	18
Work	`fact_type_priority_Work`	14
Hobby	`fact_type_priority_Hobby`	14
Learning	`fact_type_priority_Learning`	12
Preference	`fact_type_priority_Preference`	6
Other	`fact_type_priority_Other`	5
Profile	`fact_type_priority_Profile`	4
(unknown)	`fact_type_priority_default`	5

Confidence Score (0–20 points)¶

CONFIDENCE_SCORE = fact.confidence × priority_weight_confidence_max

Facts below min_confidence (default: 0.7) are excluded entirely.

Recency Malus (Rotation)¶

A penalty applied to recently-used facts, ensuring greeting variety:

Days Since Last Used	Config Key	Default (pts)
Same day	`recency_malus_day_1`	-60
1 day ago	`recency_malus_day_2`	-50
2 days ago	`recency_malus_day_3`	-40
3 days ago	`recency_malus_day_4`	-30
4 days ago	`recency_malus_day_5`	-20
5 days ago	`recency_malus_day_6`	-10
6+ days ago	—	0

When a greeting is generated, all included fact IDs are marked with last_used_at = now via a fire-and-forget background task.

Warmth Facts¶

After selecting the top priority-scored facts (default: 3), the system adds up to 1 warmth fact — a stable personal detail from configurable types (warmth_types, default: ["Pet", "Hobby", "Relationship"]). Warmth facts must not have a time_anchor (they're stable, not events) and have their own recency malus applied.

Data Model¶

Structure	Contents	Lifecycle
`fact_preloading_config` (PostgreSQL)	Single-row JSONB table with all scoring parameters	Persistent. Initialized by Alembic migration. Updated via Admin API. Read by Fact Preloading Service with 60s cache.
`user_facts` (PostgreSQL)	Stored facts with `type`, `text`, `confidence`, `time_anchor`, `last_used_at`, `last_mentioned_at`	Persistent. Queried by Fact Preloading Service. `last_used_at` updated after each greeting for rotation.
`avatars.last_greeting_time` (PostgreSQL)	Timestamp of last personalized greeting per avatar	Persistent. Read by frequency gate. Written after greeting stream completes.
Redis cache (`preloaded:facts:{user_id}:{avatar_id}`)	Preloaded facts JSON with 1-hour TTL	Cached. Warmed during greeting so main chat pipeline has facts ready.
`state["memory_domain"]["preloaded_facts"]`	Facts injected into greeting node state	In-flight. Direct injection for zero-latency access during greeting generation.

Key Design Decisions¶

Decision 1: Bypass LangGraph — direct node execution

Chosen: Greeting triggers call greeting_node() directly from the API endpoint, bypassing the full Global Supervisor pipeline.
Rejected: Running greetings through the standard LangGraph orchestration graph.
Rationale: Greetings are single-turn with no user message, no intent to classify, and no agents to invoke. The full pipeline adds 3–5 seconds of overhead. Direct execution achieves <2 second time-to-first-token.

Decision 2: Dual fact injection — state + Redis

Chosen: Facts are loaded into both the greeting node's state (for immediate use) and Redis (for cache warming).
Rejected: State-only or Redis-only approaches.
Rationale: The greeting node needs facts immediately (state injection). When the user replies to the greeting, the main chat pipeline needs those same facts without re-querying (Redis warm cache). Dual injection eliminates both latency and race conditions.

Decision 3: Database-driven scoring configuration

Chosen: All scoring parameters stored in a single-row JSONB table (fact_preloading_config), editable via Admin API at runtime.
Rejected: Hardcoded constants, environment variables, or YAML config files.
Rationale: Scoring parameters need frequent tuning (e.g., adjusting how aggressively the recency malus suppresses facts). A database-backed config with 60-second cache allows product owners to experiment without deployments. The Admin API provides a safe, auditable change mechanism.

Decision 4: Record greeting time after streaming

Chosen: last_greeting_time is written to the database after the SSE stream completes, not before.
Rejected: Recording before streaming begins.
Rationale: React StrictMode can trigger double-invocation of the greeting endpoint. If the time were recorded before streaming, the second invocation would hit the frequency gate and return a blank greeting. Recording after ensures at least one complete greeting is delivered.

Interfaces and Contracts¶

Interface	Direction	Consumer	Contract
`POST /api/v1/chats/greeting`	Inbound	Frontend	Returns SSE stream of greeting chunks. Requires auth token.
`state["memory_domain"]["preloaded_facts"]`	Inbound	Greeting endpoint (from Fact Preloading Service)	Dict with `facts` list (each fact has `text`, `type`, `priority_score`, `time_anchor`, `fact_id`, `is_warmth_fact`)
`PATCH /api/v1/admin/config/fact-preloading`	Inbound	Admin API	JSON body with config key/value pairs to update
Redis `preloaded:facts:{user_id}:{avatar_id}`	Outbound	Main chat pipeline (context loader)	JSON with preloaded facts, 1-hour TTL
`avatars.last_greeting_time`	Internal	Frequency gate	UTC timestamp, read on greeting request, written after streaming
`user_facts.last_used_at`	Internal	Recency malus computation	UTC timestamp, updated via fire-and-forget background task after greeting

Known Trade-offs and Debt¶

No conversation context in greetings: The greeting node has no access to previous chat history. It can only reference stored facts, not ongoing conversations. Adding conversation-aware greetings would require loading chat summaries, increasing latency.
Frequency gate is avatar-scoped, not device-scoped: If a user opens Swisper on their phone and laptop within 4 hours, only the first device gets the personalized greeting. The second sees the default. This is acceptable for single-device usage patterns.
Warmth fact selection is random within type: When multiple warmth facts exist, the selection is not deterministic. A more sophisticated approach could match warmth facts to the priority facts being mentioned (e.g., connect a pet fact to a health fact).
No A/B testing for prompt variants: There's no mechanism to test different greeting styles or fact presentation formats on subsets of users.