TD-001: Trace Size Optimization¶
Status: Deferred
Created: 2026-01-13
Priority: Low
Estimated Effort: 15-20 days (full approach) / 4-6 days (simple approach)
Problem Statement¶
Trace JSON payloads are larger than necessary due to repetitive state objects. A simple "weather in zurich" request produces ~150KB of trace data because:
- State repetition: The same state fields are stored in both
inputandoutputof every observation - LLM prompt duplication: System prompts (~5KB each) appear identically across 5+ observations
- Memory domain bloat:
memory_domainandfacts_by_entityare copied verbatim through every observation
Example Breakdown (Weather Request Trace)¶
| Field | Size | Occurrences | Total |
|---|---|---|---|
memory_domain |
~8KB | 17 observations × 2 (in/out) | ~272KB uncompressed |
_llm_messages (system prompts) |
~5KB | 5 LLM calls | ~25KB |
facts_by_entity |
~3KB | 17 × 2 | ~102KB |
| Other fields | ~2KB | varies | ~30KB |
Note: PostgreSQL TOAST compression reduces actual storage to ~25-35KB, and we already compress for Redis transit (Phase 3).
Analysis¶
Does This Actually Matter?¶
| Factor | Impact | Assessment |
|---|---|---|
| Storage cost | PostgreSQL TOAST compresses to ~20-25% | Low impact |
| Redis transit | Already gzip compressed (Phase 3) | Already optimized |
| UI load time | 150KB over network | Marginal (gzip helps) |
| Developer experience | Large JSON hard to debug | Minor annoyance |
| Scale (10K traces/day) | ~90-130GB/year after compression | Manageable |
Verdict: Storage and performance costs are manageable. This is a "nice to have" optimization, not urgent.
Why We're Deferring¶
- PostgreSQL already compresses JSONB via TOAST - we're not paying the full 150KB per trace
- Redis transit already optimized in Phase 3 with gzip compression
- High implementation complexity for marginal additional benefit
- Risk to critical features - Experiments feature needs exact state reconstruction
- Engineering time better spent on user-facing features
Proposed Solutions (When Revisiting)¶
Option A: Full Delta Architecture (High Complexity)¶
Store state as: baseline + deltas + content blobs
Trace
├── baseline_state (stored once)
├── content_blobs (deduplicated by hash)
│ ├── llm_prompt_abc123
│ └── tool_result_def456
└── observations
├── obs_1: { input_delta, output_delta, _llm_prompt_ref }
└── obs_2: { input_delta, output_delta }
Pros: Maximum size reduction (~75-80%) Cons: - Requires reconstruction logic in SDK, backend, AND frontend - High risk to Experiments feature (needs exact prompts) - ~15-20 days of work
Option B: Simpler Targeted Optimizations (Recommended)¶
- Extract LLM prompts to deduplicated storage (~50% reduction)
- Store
_llm_messagesseparately, reference by content hash -
Low risk - just moving data, not reconstructing
-
Only store changed fields in output (~20% additional reduction)
- If
output.memory_domain == input.memory_domain, don't include in output -
Input still has full state for reference
-
Lazy load heavy fields in UI
- Don't send
_llm_messages,tool_calls_resultsunless requested - Faster initial trace load
Combined reduction: ~60-70% with low risk
Implementation Notes (For Future Reference)¶
SDK Changes Needed¶
# Option B, Item 1: Extract LLM prompts
def _capture_output(self, state: dict) -> dict:
output = state.copy()
if "_llm_messages" in output:
prompt_hash = hashlib.sha256(
json.dumps(output["_llm_messages"]).encode()
).hexdigest()[:16]
self._publish_prompt_blob(prompt_hash, output["_llm_messages"])
output["_llm_prompt_ref"] = prompt_hash
del output["_llm_messages"]
return output
# Option B, Item 2: Only store changed fields
def _build_observation_output(self, input_state: dict, output_state: dict) -> dict:
changed = {}
for key, value in output_state.items():
if key not in input_state or input_state[key] != value:
changed[key] = value
return changed
Backend Changes Needed¶
# New model for deduplicated content
class ContentBlob(Base):
__tablename__ = "content_blobs"
id = Column(String, primary_key=True) # SHA256 hash
content_type = Column(String) # llm_prompt, tool_result
content = Column(JSONB)
size_bytes = Column(Integer)
created_at = Column(DateTime)
Frontend Changes Needed¶
// Lazy load heavy fields
async function loadObservationDetails(obsId: string): Promise<ObservationDetails> {
// Initial load excludes heavy fields
// This fetches them on demand
return api.get(`/observations/${obsId}/details?include_heavy=true`);
}
Swisper Impact¶
- Latency: Negligible (~0.1-0.5ms for delta computation)
- Code changes: None required if we do Option B
- Risk: Low - tracing failures already gracefully handled
When to Revisit¶
Consider implementing when:
- Storage costs become significant (>500GB of trace data)
- UI performance complaints about trace loading times
- Export feature requested where size matters
- Significant idle engineering time available
Related Documents¶
- ADR-005: Graph-Level Auto-Instrumentation
- Plan: Performance Optimization v1
- Analysis: SDK Tracing Gaps
Decision Log¶
| Date | Decision | Rationale |
|---|---|---|
| 2026-01-13 | Defer implementation | ROI not justified given PostgreSQL compression and existing optimizations |