Voice System¶

The Voice System provides speech-to-text (STT) and text-to-speech (TTS) capabilities for Swisper, enabling fully voice-driven conversations. It integrates with Azure Speech Services and uses WebSocket streaming for real-time audio processing.

On the input side, voice audio is streamed from the browser, transcribed to text, and fed into the Global Supervisor. On the output side, the generated text response is converted to speech and streamed back to the user.

Key Components¶

Component	Purpose
STT Integration	Speech-to-text via Azure Speech Services with streaming transcription
TTS Integration	Text-to-speech with configurable voice and language settings
WebSocket Handler	Real-time audio streaming between browser and backend
Voice Session Manager	Manages voice session lifecycle and audio format negotiation

Documentation Sections¶

Content Status

Audience section content (Overview, Architecture, Operations) will be populated during content migration (PR-6, SP-62). Section placeholders exist for navigation purposes.