Intent Classification — Architecture¶

Audience: Architects, tech leads, senior engineers evaluating design decisions and cross-module impact. This document answers "how is this module designed, and why?" Assumes technical fluency but explains domain-specific decisions.

This content was migrated from Documentation/INTENT_CLASSIFICATION.md and restructured into audience sections. Review for accuracy against the current codebase.

Context and Purpose¶

Intent Classification exists as a dedicated node to isolate the routing decision from conversation execution. The Global Supervisor needs to know — before invoking any downstream agents or tools — whether a message requires simple conversational handling or complex multi-step orchestration. By concentrating this decision in a single LLM call at the pipeline entry point, the system avoids the cost of running the full agent pipeline for messages that don't need it. This separation also means the routing logic (which model to use, what prompt, what schema) can be tuned and versioned independently of the conversation and tool execution nodes.

A key architectural driver is latency optimization: the module produces skip flags that allow downstream nodes to be bypassed entirely, saving 2–3 seconds and up to 13,000 tokens per request on simple queries.

Architecture Overview¶

graph TD
    subgraph GlobalSupervisor ["Global Supervisor Pipeline"]
        direction TB
        IC["Intent Classification\nNode"]
        ER["Entity\nResolution"]
        SR["Semantic\nRetrieval"]
        TR["Temporal\nRetrieval"]
        FE["Fact\nExtraction"]
        PE["Preference\nExtraction"]
        PL["Planner"]
        UI["UI Response"]
    end

    subgraph IntentClassification ["Intent Classification Internals"]
        direction TB
        PB["Prompt Builder\n(static/dynamic split)"]
        LLM["LLM Call\n(structured output)"]
        VAL["Flag Invariant\nValidator"]
        SCHEMA["OptimizedIntentResult\nSchema"]
    end

    UM["User Message"] --> IC
    IC --> PB
    PB -->|"system prompt\n(cached)"| LLM
    PB -->|"user content\n(per-request)"| LLM
    LLM -->|"structured JSON"| SCHEMA
    SCHEMA --> VAL
    VAL -->|"validated result"| IC

    IC -->|"route + entities"| ER
    IC -->|"needs_semantic_retrieval"| SR
    IC -->|"is_temporal_query"| TR
    IC -->|"has_extractable_facts"| FE
    IC -->|"has_preferences"| PE
    IC -->|"route=complex_chat"| PL
    PL --> UI
    IC -->|"route=simple_chat"| UI

The Intent Classification node sits at the top of the Global Supervisor pipeline. Internally, it uses a prompt builder to split the classification prompt into a static system portion (cacheable across all requests) and a dynamic user portion (unique per request). The LLM returns a structured JSON result conforming to the OptimizedIntentResult Pydantic schema. A flag invariant validator then auto-corrects known LLM error patterns (e.g., entities present but needs_semantic_retrieval set to false). The validated result is written to the pipeline state, where downstream nodes read their respective flags to decide whether to execute or skip.

Component Responsibilities¶

Component	Responsibility
Intent Classification Node (`intent_classification.py`)	Orchestrates the classification flow: extracts state, calls prompt builder, invokes LLM, applies validation, updates state. Entry point for the module.
Prompt Builder (`prompt_builder.py`)	Splits the prompt template into static (cacheable) and dynamic (per-request) parts. Injects conversation context, previously mentioned entities, and the current message into the dynamic portion.
Prompt Template (`prompts/intent_classification.md`)	Markdown-based prompt containing system instructions, decision rules, entity extraction logic, examples, and anti-patterns. Uses `[SYSTEM]`, `[DEVELOPER]`, and `[USER]` section markers.
OptimizedIntentResult Schema (Pydantic model)	Defines the structured output contract: route, temporal flags, entity list, privacy mode, and three optimization flags. Enforces types and defaults.
ExtractedEntity Schema (Pydantic model)	Defines the entity extraction contract: text, type (name/role/pronoun/pet), relationship role, pronoun resolution, and singleton flag.
Flag Invariant Validator (`_validate_flag_invariants()`)	Post-LLM guard that enforces logical rules the LLM may violate. Currently enforces: if entities exist, `needs_semantic_retrieval` must be true.
Async Preference Extraction (`_extract_and_store_preferences_async()`)	Fire-and-forget background task triggered when `has_preferences` is true. Extracts structured preferences (emoji level, verbosity, tone) via a separate LLM call and stores them in Redis for the next turn.

Data Model¶

Intent Classification does not persist data to a database. It produces an in-flight result that is written to the GlobalSupervisorState dictionary and consumed by downstream nodes within the same request.

Structure	Contents	Lifecycle
`OptimizedIntentResult`	Route decision, temporal flags, entity list, privacy mode, three optimization flags (`has_extractable_facts`, `has_preferences`, `needs_semantic_retrieval`)	Created by LLM call, validated, written to `state["intent_classification"]`. Consumed by downstream nodes in the same pipeline run. Also stored as `memory_domain.previous_intent_classification` for pronoun resolution on the next turn.
`ExtractedEntity`	Entity text, type, role, resolved name, `same_as_previous` flag, `is_singleton` flag	Nested within `OptimizedIntentResult.entities`. Passed to Entity Resolution node for disambiguation.
Session preferences (Redis)	Emoji level, verbosity, tone, format style, persona, audience context	Written to Redis asynchronously when `has_preferences` is true. Read by context loader at the start of the next turn. TTL managed by the memory service.

Key Design Decisions¶

Decision 1: Single LLM call for routing + entity extraction + optimization flags

Chosen: One structured-output LLM call that returns route, entities, and all optimization flags together.
Rejected: Separate calls for routing, entity extraction, and flag detection.
Rationale: Combining all outputs into a single call reduces latency (one round-trip instead of three) and allows the LLM to make coherent cross-field decisions (e.g., "this message has entities, so retrieval is needed"). The trade-off is a more complex prompt and output schema, but the latency savings (~200–400ms per avoided call) justify this.

Decision 2: Static/dynamic prompt split for LLM cache efficiency

Chosen: Split the prompt into a static system portion (rules, examples, anti-patterns) and a dynamic user portion (conversation context, message). The static portion is identical across all requests.
Rejected: Single combined prompt rebuilt for every request.
Rationale: LLM providers (Anthropic, OpenAI, Google) cache prompt prefixes. By keeping the first 1,000+ tokens identical, the system benefits from provider-side prompt caching, reducing cost and latency.

Decision 3: Post-LLM invariant validation rather than constrained decoding

Chosen: Let the LLM return freely within the schema, then apply invariant corrections (e.g., force needs_semantic_retrieval=true when entities exist).
Rejected: Using constrained decoding or grammar-based generation to prevent invalid combinations.
Rationale: Pydantic structured output already constrains the LLM to valid field types. The remaining invariants (cross-field logical rules) are simpler to enforce in Python post-processing than to express as generation constraints. This also makes invariants easy to add and debug.

Interfaces and Contracts¶

Interface	Direction	Consumer	Contract
`state["intent_classification"]`	Outbound	Global Supervisor routing logic, Entity Resolution, Semantic Retrieval, Temporal Retrieval, Fact Extraction, Preference Extraction	Dict matching `OptimizedIntentResult` schema: `route`, `is_temporal_query`, `temporal_query_type`, `temporal_start_date`, `temporal_end_date`, `is_system_query`, `entities`, `privacy_mode_change`, `has_extractable_facts`, `has_preferences`, `needs_semantic_retrieval`
`state["user_message"]`	Inbound	Session Init node	Plain text string — the current user message to classify
`state["memory_domain"]["previous_intent_classification"]`	Inbound	Previous pipeline run	Previous turn's intent result, used for pronoun resolution via `previously_mentioned_entities`
`LLMAdapterInterface.get_structured_output()`	Outbound	LLM Gateway	Messages list + Pydantic schema; returns `OptimizedIntentResult` instance. Uses `agent_type="intent_classification"` for per-node model overrides.
Redis session preferences	Outbound	Memory Service (Redis)	When `has_preferences` is true, async background task writes structured preferences to Redis keyed by `chat_id`.

Breaking change note: The OptimizedIntentResult schema is consumed by every downstream node in the Global Supervisor pipeline. Adding or removing fields requires updating all consumers. The entities list structure is consumed by Entity Resolution — changing ExtractedEntity fields requires coordinated updates.

Known Trade-offs and Debt¶

Privacy mode disabled: The privacy_mode_change field exists in the schema but is hardcoded to null in the node (hotfix #805/#807). The LLM was unreliably detecting privacy mode toggles, causing false positives. Re-enabling requires improving prompt reliability or adding a separate deterministic check.
Single invariant rule: Only one invariant is currently enforced (entities require retrieval). Other potential invariants (e.g., temporal queries should have is_temporal_query=true, system queries should have needs_semantic_retrieval=false) are not yet implemented. Adding them would catch more LLM errors but requires careful testing to avoid false corrections.
No A/B testing framework: There is no mechanism to test prompt changes or model swaps on a subset of traffic. Prompt updates are deployed to all users simultaneously, making it harder to measure the impact of routing accuracy changes.
Fallback is conservative but blind: On LLM failure, the fallback defaults to simple_chat with retrieval enabled. This means complex requests (email, calendar) will silently fail to route to tool agents during outages, with no user-visible error message.