TD-001: Trace Size Optimization¶

Status: Deferred
Created: 2026-01-13
Priority: Low
Estimated Effort: 15-20 days (full approach) / 4-6 days (simple approach)

Problem Statement¶

Trace JSON payloads are larger than necessary due to repetitive state objects. A simple "weather in zurich" request produces ~150KB of trace data because:

State repetition: The same state fields are stored in both input and output of every observation
LLM prompt duplication: System prompts (~5KB each) appear identically across 5+ observations
Memory domain bloat: memory_domain and facts_by_entity are copied verbatim through every observation

Example Breakdown (Weather Request Trace)¶

Field	Size	Occurrences	Total
`memory_domain`	~8KB	17 observations × 2 (in/out)	~272KB uncompressed
`_llm_messages` (system prompts)	~5KB	5 LLM calls	~25KB
`facts_by_entity`	~3KB	17 × 2	~102KB
Other fields	~2KB	varies	~30KB

Note: PostgreSQL TOAST compression reduces actual storage to ~25-35KB, and we already compress for Redis transit (Phase 3).

Analysis¶

Does This Actually Matter?¶

Factor	Impact	Assessment
Storage cost	PostgreSQL TOAST compresses to ~20-25%	Low impact
Redis transit	Already gzip compressed (Phase 3)	Already optimized
UI load time	150KB over network	Marginal (gzip helps)
Developer experience	Large JSON hard to debug	Minor annoyance
Scale (10K traces/day)	~90-130GB/year after compression	Manageable

Verdict: Storage and performance costs are manageable. This is a "nice to have" optimization, not urgent.

Why We're Deferring¶

PostgreSQL already compresses JSONB via TOAST - we're not paying the full 150KB per trace
Redis transit already optimized in Phase 3 with gzip compression
High implementation complexity for marginal additional benefit
Risk to critical features - Experiments feature needs exact state reconstruction
Engineering time better spent on user-facing features

Proposed Solutions (When Revisiting)¶

Option A: Full Delta Architecture (High Complexity)¶

Store state as: baseline + deltas + content blobs

Trace
├── baseline_state (stored once)
├── content_blobs (deduplicated by hash)
│   ├── llm_prompt_abc123
│   └── tool_result_def456
└── observations
    ├── obs_1: { input_delta, output_delta, _llm_prompt_ref }
    └── obs_2: { input_delta, output_delta }

Pros: Maximum size reduction (~75-80%) Cons: - Requires reconstruction logic in SDK, backend, AND frontend - High risk to Experiments feature (needs exact prompts) - ~15-20 days of work

Option B: Simpler Targeted Optimizations (Recommended)¶

Extract LLM prompts to deduplicated storage (~50% reduction)
Store _llm_messages separately, reference by content hash
Low risk - just moving data, not reconstructing
Only store changed fields in output (~20% additional reduction)
If output.memory_domain == input.memory_domain, don't include in output
Input still has full state for reference
Lazy load heavy fields in UI
Don't send _llm_messages, tool_calls_results unless requested
Faster initial trace load

Combined reduction: ~60-70% with low risk

Implementation Notes (For Future Reference)¶

SDK Changes Needed¶

# Option B, Item 1: Extract LLM prompts
def _capture_output(self, state: dict) -> dict:
    output = state.copy()

    if "_llm_messages" in output:
        prompt_hash = hashlib.sha256(
            json.dumps(output["_llm_messages"]).encode()
        ).hexdigest()[:16]

        self._publish_prompt_blob(prompt_hash, output["_llm_messages"])
        output["_llm_prompt_ref"] = prompt_hash
        del output["_llm_messages"]

    return output

# Option B, Item 2: Only store changed fields
def _build_observation_output(self, input_state: dict, output_state: dict) -> dict:
    changed = {}
    for key, value in output_state.items():
        if key not in input_state or input_state[key] != value:
            changed[key] = value
    return changed

Backend Changes Needed¶

# New model for deduplicated content
class ContentBlob(Base):
    __tablename__ = "content_blobs"

    id = Column(String, primary_key=True)  # SHA256 hash
    content_type = Column(String)  # llm_prompt, tool_result
    content = Column(JSONB)
    size_bytes = Column(Integer)
    created_at = Column(DateTime)

Frontend Changes Needed¶

// Lazy load heavy fields
async function loadObservationDetails(obsId: string): Promise<ObservationDetails> {
  // Initial load excludes heavy fields
  // This fetches them on demand
  return api.get(`/observations/${obsId}/details?include_heavy=true`);
}

Swisper Impact¶

Latency: Negligible (~0.1-0.5ms for delta computation)
Code changes: None required if we do Option B
Risk: Low - tracing failures already gracefully handled

When to Revisit¶

Consider implementing when:

Storage costs become significant (>500GB of trace data)
UI performance complaints about trace loading times
Export feature requested where size matters
Significant idle engineering time available

ADR-005: Graph-Level Auto-Instrumentation
Plan: Performance Optimization v1
Analysis: SDK Tracing Gaps

Decision Log¶

Date	Decision	Rationale
2026-01-13	Defer implementation	ROI not justified given PostgreSQL compression and existing optimizations