A high-performance real-time voice processing server built in Rust that provides unified Speech-to-Text (STT) and Text-to-Speech (TTS) services through WebSocket and REST APIs.
- Unified Voice API: Single interface for multiple STT/TTS providers
- Real-time Processing: WebSocket-based bidirectional audio streaming
- LiveKit Integration: WebRTC audio streaming with room-based communication
- Advanced Noise Filtering: Optional DeepFilterNet integration (
noise-filterfeature) - Provider Flexibility: Pluggable architecture supporting multiple providers
- Deepgram (STT/TTS)
- ElevenLabs (STT/TTS)
- Google Cloud (STT/TTS) - WaveNet, Neural2, and Studio voices
- Microsoft Azure (STT/TTS) - 400+ neural voices across 140+ languages
- Audio-Disabled Mode: Development mode without API keys
- Docker
- At least one provider API key (optional - can run in audio-disabled mode)
docker run -d \
-p 3001:3001 \
-e DEEPGRAM_API_KEY=your-key \
saynaai/saynaThe server will be available at http://localhost:3001.
version: "3.9"
services:
sayna:
image: saynaai/sayna
ports:
- "3001:3001"
environment:
DEEPGRAM_API_KEY: ${DEEPGRAM_API_KEY}
ELEVENLABS_API_KEY: ${ELEVENLABS_API_KEY}
CACHE_PATH: /data/cache
volumes:
- sayna-cache:/data/cache
volumes:
sayna-cache: {}For complete Docker documentation including LiveKit integration, see docs/docker.md.
You can run Sayna without Deepgram or ElevenLabs API keys by using the audio-disabled mode. Simply start the server without configuring the API keys, then send a WebSocket configuration message with audio_disabled: true:
{
"type": "config",
"config": {
"audio_disabled": true,
"stt_provider": "deepgram",
"tts_provider": "elevenlabs"
}
}This mode is useful for:
- Local development and testing
- UI/UX development without audio processing
- Testing WebSocket message flows
- Debugging non-audio features
Sayna supports customer-based authentication that delegates token validation to an external authentication service. When enabled, protected API endpoints require a valid bearer token.
Add to your .env file:
AUTH_REQUIRED=true
AUTH_SERVICE_URL=https://your-auth-service.com/auth
AUTH_SIGNING_KEY_PATH=/path/to/auth_private_key.pem
AUTH_TIMEOUT_SECONDS=5Generate signing keys:
# Generate RSA private key
openssl genrsa -out auth_private_key.pem 2048
# Extract public key (share with auth service)
openssl rsa -in auth_private_key.pem -pubout -out auth_public_key.pemcurl -X POST http://localhost:3001/speak \
-H "Authorization: Bearer your-token-here" \
-H "Content-Type: application/json" \
-d '{"text": "Hello world"}'For complete authentication setup and architecture details, see docs/authentication.md.
- Endpoint:
/ws - Protocol: WebSocket
- Purpose: Real-time bidirectional audio streaming and control
Configuration Message:
{
"type": "config",
"config": {
"stt_provider": "deepgram",
"tts_provider": "elevenlabs",
"audio_disabled": false,
"deepgram_model": "nova-2",
"elevenlabs_voice_id": "voice_id_here"
}
}Audio Input: Binary audio data (16kHz, 16-bit PCM)
Text Input:
{
"type": "text",
"text": "Convert this text to speech"
}- Health Check:
GET /- Server health check endpoint - Voices:
GET /voices- List available TTS voices (requires auth ifAUTH_REQUIRED=true) - Speak:
POST /speak- Generate speech from text (requires auth ifAUTH_REQUIRED=true) - LiveKit Token:
POST /livekit/token- Generate LiveKit participant token (requires auth ifAUTH_REQUIRED=true) - LiveKit Webhook:
POST /livekit/webhook- Webhook endpoint for LiveKit events (unauthenticated, uses LiveKit signature verification)- Called by LiveKit to deliver room and participant events
- Validates requests using LiveKit's JWT signature mechanism
- Logs SIP-related attributes for phone call troubleshooting
- See docs/livekit_webhook.md for details
- VoiceManager: Central coordinator for all voice processing operations
- Provider System: Trait-based abstraction for pluggable STT/TTS providers
- WebSocket Handler: Real-time communication and message routing
- LiveKit Integration: WebRTC audio streaming and room management
- DeepFilterNet: Advanced noise reduction with adaptive processing
- Client establishes WebSocket connection to
/ws - Client sends configuration with provider selection
- Audio processing pipeline:
- STT: Audio � Noise Filter (optional) � STT Provider � Text
- TTS: Text � TTS Provider � Audio � Client
- LiveKit mode enables room-based audio streaming
For local development, you can build and run from source.
- Rust 1.88.0 or later
- Optional: ONNX Runtime (for turn detection feature)
# Development build
cargo build
# Release build (optimized)
cargo build --release
# Run the server
cargo run
# Run with a config file
cargo run -- -c config.yamlSayna exposes several Cargo features that gate heavyweight subsystems:
turn-detect: ONNX-based speech turn detectionnoise-filter: DeepFilterNet noise suppression pipelineopenapi: OpenAPI 3.1 specification generation
# Run with turn detection
cargo run --features turn-detect
# Run with noise filter
cargo run --features noise-filter
# Run with multiple features
cargo run --features turn-detect,noise-filter,openapiThe Docker image includes turn-detect and noise-filter by default.
# Run all tests
cargo test
# Run specific test
cargo test test_name
# Run with output
cargo test -- --nocapture# Format code
cargo fmt
# Run linter
cargo clippy
# Check for security vulnerabilities
cargo audit# Build Docker image
docker build -t sayna .
# Run container
docker run -p 3001:3001 --env-file .env sayna| Variable | Description | Default | Required |
|---|---|---|---|
DEEPGRAM_API_KEY |
Deepgram API authentication | - | No* |
ELEVENLABS_API_KEY |
ElevenLabs API authentication | - | No* |
GOOGLE_APPLICATION_CREDENTIALS |
Path to Google Cloud service account JSON | - | No* |
AZURE_SPEECH_SUBSCRIPTION_KEY |
Azure Speech Services subscription key | - | No* |
AZURE_SPEECH_REGION |
Azure region (e.g., eastus, westeurope) | eastus |
No* |
LIVEKIT_URL |
LiveKit server WebSocket URL | ws://localhost:7880 |
No |
LIVEKIT_API_KEY |
LiveKit API key (for webhooks and token generation) | - | No*** |
LIVEKIT_API_SECRET |
LiveKit API secret (for webhooks and token generation) | - | No*** |
HOST |
Server bind address | 0.0.0.0 |
No |
PORT |
Server port | 3001 |
No |
AUTH_REQUIRED |
Enable authentication | false |
No |
AUTH_API_SECRETS_JSON |
API secrets JSON array ([{id, secret}]) |
- | Yes** |
AUTH_API_SECRET |
Legacy single API secret | - | No**** |
AUTH_API_SECRET_ID |
Legacy API secret id for AUTH_API_SECRET |
default |
No |
AUTH_SERVICE_URL |
External auth service endpoint | - | Yes** |
AUTH_SIGNING_KEY_PATH |
Path to JWT signing private key | - | Yes** |
AUTH_TIMEOUT_SECONDS |
Auth request timeout | 5 |
No |
*Not required when using audio-disabled mode
**Required when AUTH_REQUIRED=true for the auth method you choose
***Required for LiveKit webhook validation and token generation features
****Legacy single-secret fallback; prefer AUTH_API_SECRETS_JSON
Sayna supports first-class SIP configuration for managing SIP-specific settings. See docs/sip_config.md for detailed documentation.
YAML Configuration:
sip:
room_prefix: "sip-"
allowed_addresses:
- "192.168.1.0/24"
- "10.0.0.1"
hook_secret: "your-signing-secret" # Required if hooks configured
hooks:
- host: "example.com"
url: "https://webhook.example.com/events"Important: All SIP webhook forwarding requests are signed with HMAC-SHA256. You must configure hook_secret (or per-hook secret overrides) if using sip.hooks. See docs/livekit_webhook.md#webhook-signing for signature verification examples.
Environment Variables:
SIP_ROOM_PREFIX: Room name prefix for SIP calls (required if SIP enabled)SIP_ALLOWED_ADDRESSES: Comma-separated IP addresses/CIDRsSIP_HOOK_SECRET: Global signing secret for webhook authentication (min 16 chars)SIP_HOOKS_JSON: JSON array of webhook configurations (with optional per-hooksecretfield)
- DeepFilterNet: CPU-intensive processing uses thread pooling
- Audio Buffering: Optimized chunk processing for low latency
- Connection Reuse: Provider connections are maintained for efficiency
- Async Processing: Non-blocking WebSocket message handling
- Memory Management: Careful buffer management in audio loops
-
Review the development rules in
.cursor/rules/:rust.mdc: Rust best practicescore.mdc: Business logic specificationsaxum.mdc: Framework patternslivekit.mdc: LiveKit integration details
-
Follow the existing code patterns and conventions
-
Add tests for new features
-
Ensure
cargo fmtandcargo clippypass
For issues, questions, or contributions, please visit the GitHub repository.
