TTS/STT: Feature integration

**Is your feature request related to a problem? Please describe.**
Currently, Kaapi only supports text-based user inputs. This limits our ability to:

- Process audio and voice-based inputs from users
- Support voice-enabled chatbot experiences for partners

**Describe the solution you'd like:**
**Phase 1: TTS/STT Exploration & Benchmarking**
- Model evaluation: Benchmark existing Indic voice models (Hindi, Tamil, Bengali, etc.) against:

   - Accuracy (WER - Word Error Rate for STT, MOS - Mean Opinion Score for TTS)
   - Latency (real-time vs batch processing)
   - Cost per inference
   - Language/dialect coverage

**Phase 2: Platform Integration**

- Unified API extension:

   - Add voice model support to /configs endpoint (similar to existing LLM provider configs)
Extend /llm/call to handle audio inputs/outputs with automatic transcription
Support both streaming (real-time) and batch audio processing

- Evaluation API enhancement:

   - Add audio-specific evaluation metrics (WER, latency, speaker diarization accuracy)
Enable quick A/B testing of different TTS/STT providers
Support voice dataset management (upload, version, annotate)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TTS/STT: Feature integration #513

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TTS/STT: Feature integration #513

Description

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions