Voice System¶
The Voice System provides speech-to-text (STT) and text-to-speech (TTS) capabilities for Swisper, enabling fully voice-driven conversations. It integrates with Azure Speech Services and uses WebSocket streaming for real-time audio processing.
On the input side, voice audio is streamed from the browser, transcribed to text, and fed into the Global Supervisor. On the output side, the generated text response is converted to speech and streamed back to the user.
Key Components¶
| Component | Purpose |
|---|---|
| STT Integration | Speech-to-text via Azure Speech Services with streaming transcription |
| TTS Integration | Text-to-speech with configurable voice and language settings |
| WebSocket Handler | Real-time audio streaming between browser and backend |
| Voice Session Manager | Manages voice session lifecycle and audio format negotiation |
Documentation Sections¶
Content Status
Audience section content (Overview, Architecture, Operations) will be populated during content migration (PR-6, SP-62). Section placeholders exist for navigation purposes.