Intent Classification — Architecture¶
Audience: Architects, tech leads, senior engineers evaluating design decisions and cross-module impact. This document answers "how is this module designed, and why?" Assumes technical fluency but explains domain-specific decisions.
This content was migrated from
Documentation/INTENT_CLASSIFICATION.mdand restructured into audience sections. Review for accuracy against the current codebase.
Context and Purpose¶
Intent Classification exists as a dedicated node to isolate the routing decision from conversation execution. The Global Supervisor needs to know — before invoking any downstream agents or tools — whether a message requires simple conversational handling or complex multi-step orchestration. By concentrating this decision in a single LLM call at the pipeline entry point, the system avoids the cost of running the full agent pipeline for messages that don't need it. This separation also means the routing logic (which model to use, what prompt, what schema) can be tuned and versioned independently of the conversation and tool execution nodes.
A key architectural driver is latency optimization: the module produces skip flags that allow downstream nodes to be bypassed entirely, saving 2–3 seconds and up to 13,000 tokens per request on simple queries.
Architecture Overview¶
graph TD
subgraph GlobalSupervisor ["Global Supervisor Pipeline"]
direction TB
IC["Intent Classification\nNode"]
ER["Entity\nResolution"]
SR["Semantic\nRetrieval"]
TR["Temporal\nRetrieval"]
FE["Fact\nExtraction"]
PE["Preference\nExtraction"]
PL["Planner"]
UI["UI Response"]
end
subgraph IntentClassification ["Intent Classification Internals"]
direction TB
PB["Prompt Builder\n(static/dynamic split)"]
LLM["LLM Call\n(structured output)"]
VAL["Flag Invariant\nValidator"]
SCHEMA["OptimizedIntentResult\nSchema"]
end
UM["User Message"] --> IC
IC --> PB
PB -->|"system prompt\n(cached)"| LLM
PB -->|"user content\n(per-request)"| LLM
LLM -->|"structured JSON"| SCHEMA
SCHEMA --> VAL
VAL -->|"validated result"| IC
IC -->|"route + entities"| ER
IC -->|"needs_semantic_retrieval"| SR
IC -->|"is_temporal_query"| TR
IC -->|"has_extractable_facts"| FE
IC -->|"has_preferences"| PE
IC -->|"route=complex_chat"| PL
PL --> UI
IC -->|"route=simple_chat"| UI
The Intent Classification node sits at the top of the Global Supervisor pipeline. Internally, it uses a prompt builder to split the classification prompt into a static system portion (cacheable across all requests) and a dynamic user portion (unique per request). The LLM returns a structured JSON result conforming to the OptimizedIntentResult Pydantic schema. A flag invariant validator then auto-corrects known LLM error patterns (e.g., entities present but needs_semantic_retrieval set to false). The validated result is written to the pipeline state, where downstream nodes read their respective flags to decide whether to execute or skip.
Component Responsibilities¶
| Component | Responsibility |
|---|---|
Intent Classification Node (intent_classification.py) |
Orchestrates the classification flow: extracts state, calls prompt builder, invokes LLM, applies validation, updates state. Entry point for the module. |
Prompt Builder (prompt_builder.py) |
Splits the prompt template into static (cacheable) and dynamic (per-request) parts. Injects conversation context, previously mentioned entities, and the current message into the dynamic portion. |
Prompt Template (prompts/intent_classification.md) |
Markdown-based prompt containing system instructions, decision rules, entity extraction logic, examples, and anti-patterns. Uses [SYSTEM], [DEVELOPER], and [USER] section markers. |
| OptimizedIntentResult Schema (Pydantic model) | Defines the structured output contract: route, temporal flags, entity list, privacy mode, and three optimization flags. Enforces types and defaults. |
| ExtractedEntity Schema (Pydantic model) | Defines the entity extraction contract: text, type (name/role/pronoun/pet), relationship role, pronoun resolution, and singleton flag. |
Flag Invariant Validator (_validate_flag_invariants()) |
Post-LLM guard that enforces logical rules the LLM may violate. Currently enforces: if entities exist, needs_semantic_retrieval must be true. |
Async Preference Extraction (_extract_and_store_preferences_async()) |
Fire-and-forget background task triggered when has_preferences is true. Extracts structured preferences (emoji level, verbosity, tone) via a separate LLM call and stores them in Redis for the next turn. |
Data Model¶
Intent Classification does not persist data to a database. It produces an in-flight result that is written to the GlobalSupervisorState dictionary and consumed by downstream nodes within the same request.
| Structure | Contents | Lifecycle |
|---|---|---|
OptimizedIntentResult |
Route decision, temporal flags, entity list, privacy mode, three optimization flags (has_extractable_facts, has_preferences, needs_semantic_retrieval) |
Created by LLM call, validated, written to state["intent_classification"]. Consumed by downstream nodes in the same pipeline run. Also stored as memory_domain.previous_intent_classification for pronoun resolution on the next turn. |
ExtractedEntity |
Entity text, type, role, resolved name, same_as_previous flag, is_singleton flag |
Nested within OptimizedIntentResult.entities. Passed to Entity Resolution node for disambiguation. |
| Session preferences (Redis) | Emoji level, verbosity, tone, format style, persona, audience context | Written to Redis asynchronously when has_preferences is true. Read by context loader at the start of the next turn. TTL managed by the memory service. |
Key Design Decisions¶
Decision 1: Single LLM call for routing + entity extraction + optimization flags
- Chosen: One structured-output LLM call that returns route, entities, and all optimization flags together.
- Rejected: Separate calls for routing, entity extraction, and flag detection.
- Rationale: Combining all outputs into a single call reduces latency (one round-trip instead of three) and allows the LLM to make coherent cross-field decisions (e.g., "this message has entities, so retrieval is needed"). The trade-off is a more complex prompt and output schema, but the latency savings (~200–400ms per avoided call) justify this.
Decision 2: Static/dynamic prompt split for LLM cache efficiency
- Chosen: Split the prompt into a static system portion (rules, examples, anti-patterns) and a dynamic user portion (conversation context, message). The static portion is identical across all requests.
- Rejected: Single combined prompt rebuilt for every request.
- Rationale: LLM providers (Anthropic, OpenAI, Google) cache prompt prefixes. By keeping the first 1,000+ tokens identical, the system benefits from provider-side prompt caching, reducing cost and latency.
Decision 3: Post-LLM invariant validation rather than constrained decoding
- Chosen: Let the LLM return freely within the schema, then apply invariant corrections (e.g., force
needs_semantic_retrieval=truewhen entities exist). - Rejected: Using constrained decoding or grammar-based generation to prevent invalid combinations.
- Rationale: Pydantic structured output already constrains the LLM to valid field types. The remaining invariants (cross-field logical rules) are simpler to enforce in Python post-processing than to express as generation constraints. This also makes invariants easy to add and debug.
Interfaces and Contracts¶
| Interface | Direction | Consumer | Contract |
|---|---|---|---|
state["intent_classification"] |
Outbound | Global Supervisor routing logic, Entity Resolution, Semantic Retrieval, Temporal Retrieval, Fact Extraction, Preference Extraction | Dict matching OptimizedIntentResult schema: route, is_temporal_query, temporal_query_type, temporal_start_date, temporal_end_date, is_system_query, entities, privacy_mode_change, has_extractable_facts, has_preferences, needs_semantic_retrieval |
state["user_message"] |
Inbound | Session Init node | Plain text string — the current user message to classify |
state["memory_domain"]["previous_intent_classification"] |
Inbound | Previous pipeline run | Previous turn's intent result, used for pronoun resolution via previously_mentioned_entities |
LLMAdapterInterface.get_structured_output() |
Outbound | LLM Gateway | Messages list + Pydantic schema; returns OptimizedIntentResult instance. Uses agent_type="intent_classification" for per-node model overrides. |
| Redis session preferences | Outbound | Memory Service (Redis) | When has_preferences is true, async background task writes structured preferences to Redis keyed by chat_id. |
Breaking change note: The OptimizedIntentResult schema is consumed by every downstream node in the Global Supervisor pipeline. Adding or removing fields requires updating all consumers. The entities list structure is consumed by Entity Resolution — changing ExtractedEntity fields requires coordinated updates.
Known Trade-offs and Debt¶
- Privacy mode disabled: The
privacy_mode_changefield exists in the schema but is hardcoded tonullin the node (hotfix #805/#807). The LLM was unreliably detecting privacy mode toggles, causing false positives. Re-enabling requires improving prompt reliability or adding a separate deterministic check. - Single invariant rule: Only one invariant is currently enforced (entities require retrieval). Other potential invariants (e.g., temporal queries should have
is_temporal_query=true, system queries should haveneeds_semantic_retrieval=false) are not yet implemented. Adding them would catch more LLM errors but requires careful testing to avoid false corrections. - No A/B testing framework: There is no mechanism to test prompt changes or model swaps on a subset of traffic. Prompt updates are deployed to all users simultaneously, making it harder to measure the impact of routing accuracy changes.
- Fallback is conservative but blind: On LLM failure, the fallback defaults to
simple_chatwith retrieval enabled. This means complex requests (email, calendar) will silently fail to route to tool agents during outages, with no user-visible error message.