Skip to content

TD-001: Trace Size Optimization

Status: Deferred
Created: 2026-01-13
Priority: Low
Estimated Effort: 15-20 days (full approach) / 4-6 days (simple approach)

Problem Statement

Trace JSON payloads are larger than necessary due to repetitive state objects. A simple "weather in zurich" request produces ~150KB of trace data because:

  1. State repetition: The same state fields are stored in both input and output of every observation
  2. LLM prompt duplication: System prompts (~5KB each) appear identically across 5+ observations
  3. Memory domain bloat: memory_domain and facts_by_entity are copied verbatim through every observation

Example Breakdown (Weather Request Trace)

Field Size Occurrences Total
memory_domain ~8KB 17 observations × 2 (in/out) ~272KB uncompressed
_llm_messages (system prompts) ~5KB 5 LLM calls ~25KB
facts_by_entity ~3KB 17 × 2 ~102KB
Other fields ~2KB varies ~30KB

Note: PostgreSQL TOAST compression reduces actual storage to ~25-35KB, and we already compress for Redis transit (Phase 3).

Analysis

Does This Actually Matter?

Factor Impact Assessment
Storage cost PostgreSQL TOAST compresses to ~20-25% Low impact
Redis transit Already gzip compressed (Phase 3) Already optimized
UI load time 150KB over network Marginal (gzip helps)
Developer experience Large JSON hard to debug Minor annoyance
Scale (10K traces/day) ~90-130GB/year after compression Manageable

Verdict: Storage and performance costs are manageable. This is a "nice to have" optimization, not urgent.

Why We're Deferring

  1. PostgreSQL already compresses JSONB via TOAST - we're not paying the full 150KB per trace
  2. Redis transit already optimized in Phase 3 with gzip compression
  3. High implementation complexity for marginal additional benefit
  4. Risk to critical features - Experiments feature needs exact state reconstruction
  5. Engineering time better spent on user-facing features

Proposed Solutions (When Revisiting)

Option A: Full Delta Architecture (High Complexity)

Store state as: baseline + deltas + content blobs

Trace
├── baseline_state (stored once)
├── content_blobs (deduplicated by hash)
│   ├── llm_prompt_abc123
│   └── tool_result_def456
└── observations
    ├── obs_1: { input_delta, output_delta, _llm_prompt_ref }
    └── obs_2: { input_delta, output_delta }

Pros: Maximum size reduction (~75-80%) Cons: - Requires reconstruction logic in SDK, backend, AND frontend - High risk to Experiments feature (needs exact prompts) - ~15-20 days of work

  1. Extract LLM prompts to deduplicated storage (~50% reduction)
  2. Store _llm_messages separately, reference by content hash
  3. Low risk - just moving data, not reconstructing

  4. Only store changed fields in output (~20% additional reduction)

  5. If output.memory_domain == input.memory_domain, don't include in output
  6. Input still has full state for reference

  7. Lazy load heavy fields in UI

  8. Don't send _llm_messages, tool_calls_results unless requested
  9. Faster initial trace load

Combined reduction: ~60-70% with low risk

Implementation Notes (For Future Reference)

SDK Changes Needed

# Option B, Item 1: Extract LLM prompts
def _capture_output(self, state: dict) -> dict:
    output = state.copy()

    if "_llm_messages" in output:
        prompt_hash = hashlib.sha256(
            json.dumps(output["_llm_messages"]).encode()
        ).hexdigest()[:16]

        self._publish_prompt_blob(prompt_hash, output["_llm_messages"])
        output["_llm_prompt_ref"] = prompt_hash
        del output["_llm_messages"]

    return output

# Option B, Item 2: Only store changed fields
def _build_observation_output(self, input_state: dict, output_state: dict) -> dict:
    changed = {}
    for key, value in output_state.items():
        if key not in input_state or input_state[key] != value:
            changed[key] = value
    return changed

Backend Changes Needed

# New model for deduplicated content
class ContentBlob(Base):
    __tablename__ = "content_blobs"

    id = Column(String, primary_key=True)  # SHA256 hash
    content_type = Column(String)  # llm_prompt, tool_result
    content = Column(JSONB)
    size_bytes = Column(Integer)
    created_at = Column(DateTime)

Frontend Changes Needed

// Lazy load heavy fields
async function loadObservationDetails(obsId: string): Promise<ObservationDetails> {
  // Initial load excludes heavy fields
  // This fetches them on demand
  return api.get(`/observations/${obsId}/details?include_heavy=true`);
}

Swisper Impact

  • Latency: Negligible (~0.1-0.5ms for delta computation)
  • Code changes: None required if we do Option B
  • Risk: Low - tracing failures already gracefully handled

When to Revisit

Consider implementing when:

  1. Storage costs become significant (>500GB of trace data)
  2. UI performance complaints about trace loading times
  3. Export feature requested where size matters
  4. Significant idle engineering time available

Decision Log

Date Decision Rationale
2026-01-13 Defer implementation ROI not justified given PostgreSQL compression and existing optimizations