Testing Guide for AI Agents¶
Version: v1.0
Last Updated: 2025-10-29
Last Updated By: heiko
Status: Active
Audience: AI Assistants (Cursor, Claude)
Purpose: How to write, mark, and execute tests following Swisper's TDD workflow
Changelog¶
v1.0 - 2025-10-29¶
- Initial creation for AI agents
- Docker container testing workflow
- CI test marking strategy
- Test templates and examples
- Quick reference checklist
🎯 Core Philosophy¶
Write comprehensive tests with real infrastructure, mark only critical tests for CI.
The Rules:¶
- ✅ Use real infrastructure (database, Redis, LLM)
- ✅ Mock ONLY when: cost prohibitive, unstable, or rate-limited
- ✅ Mark 1-2 critical tests per domain for CI
- ✅ Execute in Docker containers (not locally)
- ✅ Always run in verbose mode (
-vv) - ✅ Watch terminal output to verify status
🐳 Docker Container Testing Workflow¶
Mandatory 3-Step Process:¶
Every time you write or modify tests:
# Step 1: UPDATE container with latest code
docker compose cp apps/backend/swisper/. backend:/app/swisper/
docker compose cp backend/tests/. backend:/code/tests/
# Step 2: EXECUTE tests in verbose mode
docker compose exec backend pytest tests/api/test_my_feature.py -vv
# Step 3: WATCH terminal output and VERIFY status
# TDD Red: Tests should FAIL
# TDD Green: Tests should PASS
NEVER run tests locally with poetry run pytest - always use Docker!
🏷️ CI Test Marking¶
Mark CI-Critical Tests:¶
Only 1-2 tests per domain run in CI to keep CI fast (<5 minutes).
# ✅ CI-Critical Test (runs in CI)
@pytest.mark.ci_critical
@pytest.mark.asyncio
async def test_fact_extraction_basic_flow(db, test_user):
"""CI: Basic fact extraction works"""
result = await extract_facts("I'm allergic to peanuts", test_user.id)
assert len(result.facts) > 0
assert result.facts[0].type == "Allergy"
Criteria for CI-critical: - ✅ Golden path (most important scenario) - ✅ Fast (<5 seconds) - ✅ No expensive LLM calls (or mocked) - ✅ Regression prevention
Mark Comprehensive Tests to Skip CI:¶
Tests with real LLM calls or many scenarios should skip CI.
# ✅ Comprehensive Test (skip in CI, run locally)
import os
pytestmark = pytest.mark.skipif(
os.getenv("CI") == "true",
reason="CI: Uses real LLM calls - run locally only"
)
@pytest.mark.asyncio
async def test_entity_disambiguation_comprehensive(db, test_user):
"""
Comprehensive: 10+ scenarios with real LLM.
Tests edge cases, ambiguous entities, relationship filtering.
"""
# Expensive but thorough - run locally and nightly
# Test scenario 1
# Test scenario 2
# ... 10+ scenarios
pass
Mark to skip CI if: - ❌ Uses real LLM calls (expensive) - ❌ Tests many edge case variations (not critical for every commit) - ❌ Performance tests (run in nightly) - ❌ External API integration (rate limits)
📋 Test Templates¶
Template 1: Business Value Test¶
@pytest.mark.asyncio
async def test_{feature}_{business_scenario}(db, test_user_with_avatar):
"""
Business case: [Describe user scenario in plain English]
Expected: [What should happen]
"""
user_id, avatar_id, workspace_id = test_user_with_avatar
service = MyService()
input_data = MyInput(
user_message="[Realistic user message]",
user_id=str(user_id),
avatar_id=str(avatar_id),
workspace_id=str(workspace_id)
)
result = await service.execute(input_data, db=db)
# Assert business outcome
assert len(result.entities_created) == 1, \
f"Should create 1 entity, got {len(result.entities_created)}"
Template 2: Edge Case Test¶
@pytest.mark.asyncio
async def test_{feature}_handles_{edge_case}(db, test_user):
"""Edge case: [Describe boundary condition]"""
result = await function_under_test(
edge_case_input,
user_id=test_user.id
)
assert result.is_valid()
# Or assert proper error handling
Template 3: Error Case Test¶
@pytest.mark.asyncio
async def test_{feature}_returns_error_for_{invalid_input}(db):
"""Error case: [Invalid input] should raise [specific error]"""
with pytest.raises(ValueError, match="Expected error message"):
await function_under_test(invalid_input, user_id="invalid")
🎯 TDD Workflow for Agents¶
Step 2: TDD (Red) - Write Tests¶
Actions:
1. ✅ Write 3-5 comprehensive business tests
2. ✅ Write 2-3 edge case tests
3. ✅ Write 1-2 error case tests
4. ✅ Mark 1-2 as CI-critical with @pytest.mark.ci_critical
5. ✅ Mark expensive LLM tests with skipif(CI)
6. 🔄 UPDATE container:
docker compose cp apps/backend/swisper/. backend:/app/swisper/
docker compose cp backend/tests/. backend:/code/tests/
Step 5: Implement (Green) - After Implementation¶
Actions: 1. 🔄 UPDATE container:
2. ⚡ EXECUTE: 3. 👁️ WATCH terminal output 4. ✅ VERIFY: Tests PASS (green)Step 6: Refactor - After Code Quality Improvements¶
Actions: 1. 🔄 UPDATE container:
2. ⚡ EXECUTE: 3. 👁️ WATCH terminal output 4. ✅ VERIFY: Tests STILL PASS📊 CI Allocation Per Domain¶
Goal: CI in <5 minutes total
| Domain | CI Tests | Local Tests | Example CI Test |
|---|---|---|---|
| Fact Extraction | 2 | 10+ | Basic extraction, allergy facts |
| Entity Disambiguation | 2 | 8+ | Colleague vs public figure |
| Intent Classification | 1 | 5+ | Simple chat detection |
| Preference Extraction | 1 | 6+ | Basic preference detection |
| API Endpoints | 1 each | Many | Chat creation, message send |
Total CI: 10-15 tests (~60-90 seconds)
Total Comprehensive: 50-100+ tests (run locally)
✅ Test Writing Checklist¶
Before writing tests:
- [ ] Read docs/plans/plan_{feature}_v{version}.md for test scenarios
- [ ] Identify business-critical scenarios (1-2 for CI)
- [ ] Identify edge cases and error cases (skip CI)
While writing tests:
- [ ] Use real infrastructure (DB, Redis, LLM when needed)
- [ ] Write descriptive docstrings (business case + expected outcome)
- [ ] Use realistic test data (not "test" or "foo")
- [ ] Mark 1-2 as @pytest.mark.ci_critical
- [ ] Mark expensive LLM tests with skipif(CI)
After writing tests (TDD Red):
- [ ] Update container: docker compose cp apps/backend/swisper/. backend:/app/swisper/
- [ ] Update container: docker compose cp backend/tests/. backend:/code/tests/
- [ ] Execute: docker compose exec backend pytest <file> -vv
- [ ] Watch terminal output
- [ ] Verify tests FAIL (red)
After implementing (TDD Green):
- [ ] Update container: docker compose cp apps/backend/swisper/. backend:/app/swisper/
- [ ] Execute: docker compose exec backend pytest <file> -vv
- [ ] Watch terminal output
- [ ] Verify tests PASS (green)
After refactoring:
- [ ] Update container: docker compose cp apps/backend/swisper/. backend:/app/swisper/
- [ ] Execute: docker compose exec backend pytest <file> -vv
- [ ] Watch terminal output
- [ ] Verify tests STILL PASS
🚫 Common Mistakes¶
❌ DON'T:¶
# ❌ Running tests locally (not in Docker)
poetry run pytest tests/api/test_my_feature.py
# ❌ Not marking CI tests
@pytest.mark.asyncio # Missing ci_critical marker!
async def test_critical_flow(db):
pass
# ❌ Not skipping expensive tests in CI
@pytest.mark.asyncio # Should have skipif(CI)!
async def test_comprehensive_llm_scenarios(db):
# Uses real LLM - should skip CI!
pass
# ❌ Not updating container before running tests
docker compose exec backend pytest ... # Missing docker compose cp step!
✅ DO:¶
# ✅ Mark CI-critical tests
@pytest.mark.ci_critical
@pytest.mark.asyncio
async def test_critical_flow(db):
pass
# ✅ Skip expensive tests in CI
import os
pytestmark = pytest.mark.skipif(
os.getenv("CI") == "true",
reason="CI: Uses real LLM calls"
)
@pytest.mark.asyncio
async def test_comprehensive_scenarios(db):
pass
# ✅ Update container before executing tests
# In terminal/commands:
docker compose cp apps/backend/swisper/. backend:/app/swisper/
docker compose cp backend/tests/. backend:/code/tests/
docker compose exec backend pytest tests/api/test_my_feature.py -vv
📚 pytest Configuration¶
Ensure backend/pytest.ini has:
[pytest]
markers =
ci_critical: Critical tests that run in CI (fast, golden path)
asyncio: Async tests
performance: Performance/load tests (skip in CI)
Run different test suites:
# CI tests only (what runs in CI)
docker compose exec backend pytest -m ci_critical -vv
# All tests (comprehensive local testing)
docker compose exec backend pytest -vv
# Skip CI tests (run comprehensive only)
docker compose exec backend pytest -m "not ci_critical" -vv
🔍 Quick Command Reference¶
# ====================
# BEFORE Running Tests
# ====================
# Always update container first!
docker compose cp apps/backend/swisper/. backend:/app/swisper/
docker compose cp backend/tests/. backend:/code/tests/
# ====================
# EXECUTE Tests
# ====================
# Specific test file
docker compose exec backend pytest tests/api/test_my_feature.py -vv
# Specific test function
docker compose exec backend pytest tests/api/test_my_feature.py::test_function_name -vv
# All tests
docker compose exec backend pytest -vv
# CI-critical tests only
docker compose exec backend pytest -m ci_critical -vv
# ====================
# VERIFY Status
# ====================
# Watch terminal for:
# - PASSED (green) ✅
# - FAILED (red) ❌
# - Test names and progress
# - Assertion messages
📖 Related Documentation¶
- Workflow Rules:
.cursor/rules/00-workflow.mdc- Complete TDD workflow - Development SOP:
.cursor/rules/development-sop.mdc- Testing in workflow - Common Mistakes:
.cursor/rules/35-implementation-common-mistakes.mdc- CI marking examples - Testing Guide (Developers):
docs/guides/TESTING_GUIDE.md- Conceptual overview
Remember: Real infrastructure + Docker containers + CI selection = Quality + Speed! 🎯