-
Notifications
You must be signed in to change notification settings - Fork 7
Labels
Description
Is your feature request related to a problem? Please describe.
Currently, Kaapi only supports text-based user inputs. This limits our ability to:
- Process audio and voice-based inputs from users
- Support voice-enabled chatbot experiences for partners
Describe the solution you'd like:
Phase 1: TTS/STT Exploration & Benchmarking
-
Model evaluation: Benchmark existing Indic voice models (Hindi, Tamil, Bengali, etc.) against:
- Accuracy (WER - Word Error Rate for STT, MOS - Mean Opinion Score for TTS)
- Latency (real-time vs batch processing)
- Cost per inference
- Language/dialect coverage
Phase 2: Platform Integration
-
Unified API extension:
- Add voice model support to /configs endpoint (similar to existing LLM provider configs)
Extend /llm/call to handle audio inputs/outputs with automatic transcription
Support both streaming (real-time) and batch audio processing
- Add voice model support to /configs endpoint (similar to existing LLM provider configs)
-
Evaluation API enhancement:
- Add audio-specific evaluation metrics (WER, latency, speaker diarization accuracy)
Enable quick A/B testing of different TTS/STT providers
Support voice dataset management (upload, version, annotate)
- Add audio-specific evaluation metrics (WER, latency, speaker diarization accuracy)
Sub-issues
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
No status