TD-002: LangGraph Pydantic State Serialization Pattern¶
Status: Identified (Workaround Implemented) Priority: Medium Estimated Effort: 1-2 weeks (Option C: State Interface Layer) Date Identified: 2025-11-27 Identified By: heiko (during HITL bug investigation)
Description¶
What: We store Pydantic models directly in LangGraph TypedDict state, which conflicts with how LangGraph's checkpoint serialization works. When state is checkpointed to Redis and restored, Pydantic objects come back as plain dictionaries, causing AttributeError when accessing model attributes.
Current State (Anti-Pattern):
# backend/app/api/services/agents/global_supervisor_state.py
class GlobalSupervisorState(TypedDict):
global_planner_decision: GlobalPlannerDecision # Pydantic stored directly ❌
user_in_the_loop: UserInTheLoop # Pydantic stored directly ❌
agent_responses: AgentResponses # Pydantic stored directly ❌
# ... more Pydantic models in state
# In nodes - accessing attributes fails after checkpoint restore:
def some_node(state):
decision = state.get("global_planner_decision")
plan = decision.current_plan # ❌ AttributeError: 'dict' has no attribute 'current_plan'
Official LangGraph Pattern:
# State should only contain primitives, dicts, lists, messages
class State(TypedDict):
input: str
user_feedback: str
messages: list[BaseMessage]
# Pydantic is ONLY for LLM structured output parsing
class AskHuman(BaseModel):
question: str
# In nodes - extract values from Pydantic, store as primitives/dicts
def ask_human(state):
ask = AskHuman.model_validate(llm_output) # Parse with Pydantic
location = interrupt(ask.question) # Use the value
return {"messages": tool_message} # Store as dict, not Pydantic
Why It Exists: - Original implementation predates LangGraph 1.0.x which uses msgpack serialization - Pydantic models provide nice validation and autocomplete in IDEs - Pattern worked locally but failed after checkpoint restore - Issue surfaced during HITL (Human-in-the-Loop) flows where state is persisted
Workaround Implemented¶
We've implemented state_helpers.py modules in each agent that safely handle both dict and Pydantic model access:
# backend/app/api/services/agents/global_supervisor/state_helpers.py
def get_planner_decision(state: GlobalSupervisorState) -> GlobalPlannerDecision | None:
"""Get planner_decision from state, handling both dict and Pydantic model."""
raw = state.get("global_planner_decision")
if raw is None:
return None
if isinstance(raw, dict):
return GlobalPlannerDecision(**raw) # Reconstruct Pydantic
return raw # Already Pydantic
Files implementing this workaround:
- backend/app/api/services/agents/global_supervisor/state_helpers.py
- backend/app/api/services/agents/wealth_agent/state_helpers.py
- backend/app/api/services/agents/research_agent/state_helpers.py
- backend/app/api/services/agents/productivity_agent/state_helpers.py
- backend/app/api/services/agents/doc_agent/state_helpers.py
Impact¶
Maintainability¶
- ⚠️ Every new Pydantic field in state needs a helper function
- ⚠️ Easy to forget helpers - new developers might access state directly
- ⚠️ Workaround, not a fix - doesn't address root cause
- ⚠️ Pattern diverges from official LangGraph examples
Performance¶
- ✅ Minimal Impact - helper functions add negligible overhead
- ✅ Works correctly - HITL resume functions properly now
Testing¶
- ⚠️ Must test checkpoint scenarios - unit tests without Redis miss the issue
- ⚠️ Integration tests essential - need real Redis checkpointer to catch issues
Developer Experience¶
- ⚠️ IDE autocomplete works with helpers (type hints preserved)
- ⚠️ Easy to make mistakes when adding new state fields
- ⚠️ Documentation burden - must explain the pattern to new developers
Overall Impact: Medium - System works with workaround, but architectural debt accumulates.
Remediation Options¶
Option A: Full Refactor (Not Recommended Now)¶
Convert all state fields to plain dicts, never store Pydantic in state. - Effort: 2-3 weeks - Risk: High (touching all agents) - When: If adding 3+ new agents or major state model changes
Option B: Keep Helpers (Current - Short Term)¶
Continue with state_helpers.py pattern.
- Effort: Done
- Risk: Low
- When: Now through next 1-2 sprints
Option C: State Interface Layer (Recommended - Medium Term)¶
Create accessor classes with automatic serialization/deserialization:
class GlobalSupervisorStateAccessor:
def __init__(self, state: GlobalSupervisorState):
self._state = state
@property
def planner_decision(self) -> GlobalPlannerDecision | None:
raw = self._state.get("global_planner_decision")
return GlobalPlannerDecision(**raw) if isinstance(raw, dict) else raw
@planner_decision.setter
def planner_decision(self, value: GlobalPlannerDecision):
self._state["global_planner_decision"] = value.model_dump()
- Effort: 1-2 weeks
- Risk: Medium
- When: Next major sprint, or when adding new domain agent
When to Increase Priority¶
- ❗ High Priority if: Adding 2+ new domain agents (to avoid spreading pattern)
- ❗ High Priority if: Multiple developers encounter AttributeError bugs
- ❗ High Priority if: LangGraph 2.0 breaks current workaround
- ❗ High Priority if: Major state model refactoring planned
Success Criteria (For Full Remediation)¶
✅ State only contains primitives, dicts, lists, messages
✅ Pydantic used only for LLM output parsing
✅ No state_helpers.py workaround files needed
✅ All HITL scenarios work without special handling
✅ New developers can follow official LangGraph docs
✅ All tests pass (unit + integration with Redis)
Related¶
- Root Cause: LangGraph checkpoint serialization uses msgpack, Pydantic models become dicts
- Official Pattern: https://langchain-ai.github.io/langgraph/how-tos/human_in_the_loop/wait-user-input/
- GitHub Issue: https://github.com/langchain-ai/langgraph/issues/5733 (TypedDict recommendation)
- Code References:
backend/app/api/services/agents/global_supervisor_state.py(state definition)backend/app/api/services/agents/*/state_helpers.py(workaround files)backend/app/api/services/state_persistence/redis_checkpoint_service.py(checkpointer)
Status Updates¶
- 2025-11-27: Technical debt identified during HITL bug investigation
- 2025-11-27: Workaround (state_helpers.py) implemented for all agents
- 2025-11-27:
decode_responses=Falsefix applied to Redis checkpointer - Future: When implementing Option C, update Status to "In Progress"
- Future: When complete, update Status to "Resolved"
Notes¶
Developer Note:
When adding new Pydantic models to agent state:
1. Add a getter function to the agent's state_helpers.py
2. Use the helper function in nodes, never access state.get() directly for Pydantic fields
3. Consider if the field really needs to be Pydantic, or if a plain dict would suffice
Architecture Note: This is a known limitation of using Pydantic with LangGraph checkpointing. The official recommendation is to use TypedDict with primitive types for state, and Pydantic only for LLM structured output parsing. Our current architecture diverges from this pattern for developer ergonomics (validation, autocomplete) at the cost of needing workaround helpers.
Testing Note: Always test HITL flows with actual Redis checkpointer, not just in-memory. The serialization issue only manifests when state is persisted and restored from Redis.