Greeting System — Architecture¶
Audience: Architects, tech leads, senior engineers evaluating design decisions and cross-module impact. This document answers "how is this module designed, and why?" Assumes technical fluency but explains domain-specific decisions.
This content was migrated from
Documentation/GREETING_SYSTEM.mdand restructured into audience sections. Review for accuracy against the current codebase.
Context and Purpose¶
The Greeting System exists as a separate, fast-path module outside the main LangGraph pipeline. Normal chat messages traverse the full Global Supervisor graph (intent classification → entity resolution → retrieval → planning → UI response), which takes 3–8 seconds. Greetings must appear instantly when the user opens the app — the latency budget is under 2 seconds to first token. This drives the key architectural decision: greetings bypass the orchestration graph entirely and call the greeting node directly from a dedicated API endpoint.
The second major architectural concern is fact selection intelligence. The system must pick the most relevant, timely, and engaging facts from potentially dozens of stored facts, while avoiding repetition across sessions. This is handled by a priority scoring formula with four configurable components, managed via a runtime-editable database configuration table.
Architecture Overview¶
graph TD
subgraph Frontend ["Frontend"]
OPEN["User Opens App"]
SSE["SSE Stream\nDisplay"]
end
subgraph FrequencyGate ["Frequency Gate"]
FG{"Last greeting\n>= 4 hours ago?"}
DEFAULT["Return default\ngreeting"]
end
subgraph FactLoading ["Fact Preloading Pipeline"]
FPS["Fact Preloading\nService"]
SCORE["Priority Scoring\n(0-100 points)"]
WARMTH["Warmth Fact\nSelection"]
REDIS["Redis Cache\n(1h TTL)"]
end
subgraph Generation ["Greeting Generation"]
LANG["Language\nDetection"]
VARIANT{"Facts\navailable?"}
FULL["greeting.md\n(personalized)"]
SIMPLE["greeting_simple.md\n(generic)"]
LLM["LLM Stream\n(greeting agent_type)"]
end
subgraph Config ["Configuration"]
DB["fact_preloading_config\n(PostgreSQL JSONB)"]
ADMIN["Admin API\n(PATCH endpoint)"]
end
OPEN -->|"POST /api/v1/chats/greeting"| FG
FG -->|"No (< 4h)"| DEFAULT
FG -->|"Yes"| FPS
FPS --> SCORE
SCORE --> WARMTH
WARMTH --> REDIS
REDIS --> LANG
LANG --> VARIANT
VARIANT -->|"1+ facts"| FULL
VARIANT -->|"0 facts"| SIMPLE
FULL --> LLM
SIMPLE --> LLM
LLM -->|"SSE chunks"| SSE
DB -.->|"scoring params"| SCORE
ADMIN -.->|"runtime updates"| DB
DEFAULT --> SSE
The flow is: frontend triggers POST /api/v1/chats/greeting → frequency gate checks if enough time has passed → Fact Preloading Service queries facts from PostgreSQL and applies priority scoring using parameters from the fact_preloading_config table → warmth facts are appended → results cached in Redis → language detected → prompt variant selected → LLM streams greeting via SSE. The greeting time is recorded after streaming completes (to handle React StrictMode double-invocation).
Component Responsibilities¶
| Component | Responsibility |
|---|---|
Greeting Node (greeting.py) |
Orchestrates the greeting flow: reads preloaded facts from state, determines language, selects prompt variant, calls LLM, yields streamed chunks. |
Greeting Builder (greeting_builder.py) |
Loads .md prompt templates, injects placeholders (name, time of day, facts, language), formats fact metadata with temporal position (PAST/TODAY/UPCOMING), marks used facts for rotation. |
Greeting Frequency (greeting_frequency.py) |
Frequency gate: checks avatars.last_greeting_time against configurable thresholds (GREETING_MIN_HOURS_GAP, GREETING_MIN_DAYS_GAP). Records greeting time after streaming. |
Fact Preloading Service (fact_preloading.py) |
Loads top facts using the priority scoring formula. Queries user_facts table, applies time urgency + type priority + confidence + recency malus scoring. Caches results in Redis (1h TTL). |
Fact Preloading Config Service (fact_preloading_config.py) |
Manages the fact_preloading_config database table. Provides 60-second in-memory caching. All scoring parameters are read from this table. |
Prompt Templates (prompts/greeting*.md) |
Three variants: greeting.md (personalized with facts), greeting_simple.md (warm generic), greeting_voice.md (voice mode). Use {{placeholder}} injection. |
Admin Fact Config API (admin/fact_config.py) |
PATCH endpoint for runtime configuration updates. Changes take effect within 60 seconds (cache TTL). |
The Fact Scoring System¶
The scoring formula is the architectural core of the Greeting System. Every fact is scored using four components, all configurable at runtime:
PRIORITY_SCORE = TIME_URGENCY + FACT_TYPE_PRIORITY + CONFIDENCE_SCORE + RECENCY_MALUS
(0–50 pts) (0–30 pts) (0–20 pts) (-60 to 0 pts)
Time Urgency (0–50 points)¶
Scores facts based on temporal proximity. Computed from the fact's time_anchor date relative to today.
| Time Window | Config Key | Default (days) | Score Config Key | Default (pts) |
|---|---|---|---|---|
| Imminent future | time_window_imminent_future |
3 | urgency_score_imminent_future |
50 |
| Imminent past | time_window_imminent_past |
3 | urgency_score_imminent_past |
45 |
| Near future | time_window_near_future |
7 | urgency_score_near_future |
40 |
| Near past | time_window_near_past |
7 | urgency_score_near_past |
30 |
| Recently created | — | — | urgency_score_recent_creation |
20 |
| Life change (90d) | time_window_life_change |
90 | urgency_score_life_change |
15 |
| Stable (no date) | — | — | urgency_score_stable |
10 |
Fact Type Priority (0–30 points)¶
Each of the 12 fact types has a configurable priority score:
| Fact Type | Config Key | Default (pts) |
|---|---|---|
| Schedule | fact_type_priority_Schedule |
30 |
| Travel | fact_type_priority_Travel |
28 |
| Milestone | fact_type_priority_Milestone |
26 |
| Health | fact_type_priority_Health |
20 |
| Relationship | fact_type_priority_Relationship |
18 |
| Pet | fact_type_priority_Pet |
18 |
| Work | fact_type_priority_Work |
14 |
| Hobby | fact_type_priority_Hobby |
14 |
| Learning | fact_type_priority_Learning |
12 |
| Preference | fact_type_priority_Preference |
6 |
| Other | fact_type_priority_Other |
5 |
| Profile | fact_type_priority_Profile |
4 |
| (unknown) | fact_type_priority_default |
5 |
Confidence Score (0–20 points)¶
Facts below min_confidence (default: 0.7) are excluded entirely.
Recency Malus (Rotation)¶
A penalty applied to recently-used facts, ensuring greeting variety:
| Days Since Last Used | Config Key | Default (pts) |
|---|---|---|
| Same day | recency_malus_day_1 |
-60 |
| 1 day ago | recency_malus_day_2 |
-50 |
| 2 days ago | recency_malus_day_3 |
-40 |
| 3 days ago | recency_malus_day_4 |
-30 |
| 4 days ago | recency_malus_day_5 |
-20 |
| 5 days ago | recency_malus_day_6 |
-10 |
| 6+ days ago | — | 0 |
When a greeting is generated, all included fact IDs are marked with last_used_at = now via a fire-and-forget background task.
Warmth Facts¶
After selecting the top priority-scored facts (default: 3), the system adds up to 1 warmth fact — a stable personal detail from configurable types (warmth_types, default: ["Pet", "Hobby", "Relationship"]). Warmth facts must not have a time_anchor (they're stable, not events) and have their own recency malus applied.
Data Model¶
| Structure | Contents | Lifecycle |
|---|---|---|
fact_preloading_config (PostgreSQL) |
Single-row JSONB table with all scoring parameters | Persistent. Initialized by Alembic migration. Updated via Admin API. Read by Fact Preloading Service with 60s cache. |
user_facts (PostgreSQL) |
Stored facts with type, text, confidence, time_anchor, last_used_at, last_mentioned_at |
Persistent. Queried by Fact Preloading Service. last_used_at updated after each greeting for rotation. |
avatars.last_greeting_time (PostgreSQL) |
Timestamp of last personalized greeting per avatar | Persistent. Read by frequency gate. Written after greeting stream completes. |
Redis cache (preloaded:facts:{user_id}:{avatar_id}) |
Preloaded facts JSON with 1-hour TTL | Cached. Warmed during greeting so main chat pipeline has facts ready. |
state["memory_domain"]["preloaded_facts"] |
Facts injected into greeting node state | In-flight. Direct injection for zero-latency access during greeting generation. |
Key Design Decisions¶
Decision 1: Bypass LangGraph — direct node execution
- Chosen: Greeting triggers call
greeting_node()directly from the API endpoint, bypassing the full Global Supervisor pipeline. - Rejected: Running greetings through the standard LangGraph orchestration graph.
- Rationale: Greetings are single-turn with no user message, no intent to classify, and no agents to invoke. The full pipeline adds 3–5 seconds of overhead. Direct execution achieves <2 second time-to-first-token.
Decision 2: Dual fact injection — state + Redis
- Chosen: Facts are loaded into both the greeting node's state (for immediate use) and Redis (for cache warming).
- Rejected: State-only or Redis-only approaches.
- Rationale: The greeting node needs facts immediately (state injection). When the user replies to the greeting, the main chat pipeline needs those same facts without re-querying (Redis warm cache). Dual injection eliminates both latency and race conditions.
Decision 3: Database-driven scoring configuration
- Chosen: All scoring parameters stored in a single-row JSONB table (
fact_preloading_config), editable via Admin API at runtime. - Rejected: Hardcoded constants, environment variables, or YAML config files.
- Rationale: Scoring parameters need frequent tuning (e.g., adjusting how aggressively the recency malus suppresses facts). A database-backed config with 60-second cache allows product owners to experiment without deployments. The Admin API provides a safe, auditable change mechanism.
Decision 4: Record greeting time after streaming
- Chosen:
last_greeting_timeis written to the database after the SSE stream completes, not before. - Rejected: Recording before streaming begins.
- Rationale: React StrictMode can trigger double-invocation of the greeting endpoint. If the time were recorded before streaming, the second invocation would hit the frequency gate and return a blank greeting. Recording after ensures at least one complete greeting is delivered.
Interfaces and Contracts¶
| Interface | Direction | Consumer | Contract |
|---|---|---|---|
POST /api/v1/chats/greeting |
Inbound | Frontend | Returns SSE stream of greeting chunks. Requires auth token. |
state["memory_domain"]["preloaded_facts"] |
Inbound | Greeting endpoint (from Fact Preloading Service) | Dict with facts list (each fact has text, type, priority_score, time_anchor, fact_id, is_warmth_fact) |
PATCH /api/v1/admin/config/fact-preloading |
Inbound | Admin API | JSON body with config key/value pairs to update |
Redis preloaded:facts:{user_id}:{avatar_id} |
Outbound | Main chat pipeline (context loader) | JSON with preloaded facts, 1-hour TTL |
avatars.last_greeting_time |
Internal | Frequency gate | UTC timestamp, read on greeting request, written after streaming |
user_facts.last_used_at |
Internal | Recency malus computation | UTC timestamp, updated via fire-and-forget background task after greeting |
Known Trade-offs and Debt¶
- No conversation context in greetings: The greeting node has no access to previous chat history. It can only reference stored facts, not ongoing conversations. Adding conversation-aware greetings would require loading chat summaries, increasing latency.
- Frequency gate is avatar-scoped, not device-scoped: If a user opens Swisper on their phone and laptop within 4 hours, only the first device gets the personalized greeting. The second sees the default. This is acceptable for single-device usage patterns.
- Warmth fact selection is random within type: When multiple warmth facts exist, the selection is not deterministic. A more sophisticated approach could match warmth facts to the priority facts being mentioned (e.g., connect a pet fact to a health fact).
- No A/B testing for prompt variants: There's no mechanism to test different greeting styles or fact presentation formats on subsets of users.