feat: add multi-provider audio generation support by esafwan · Pull Request #135 · tridz-dev/huf

esafwan · 2026-02-07T16:55:57Z

Add text-to-speech functionality using LiteLLM's speech() API for multi-provider audio generation support.

Features:

Supports OpenAI (tts-1, tts-1-hd), Gemini, ElevenLabs, Azure, Vertex AI, AWS Polly
Auto-detects TTS model based on provider if not specified
Supports voice, speed, and format parameters
Creates Agent Message with kind="Audio" and generated_audio field
Emits WebSocket events for real-time updates
Follows same pattern as image generation handler

Technical Details:

Saves audio files to Frappe File Manager
Returns comprehensive metadata (url, file_id, message_id, voice, format)
Adds generated_audio field to Agent Message DocType
Adds "Audio" kind option to Agent Message

Add _get_default_tts_model() function to auto-select appropriate TTS model based on provider. Supports: - OpenAI, Azure: tts-1 - Google, Gemini: gemini-2.5-flash-preview-tts - ElevenLabs: eleven_multilingual_v2 - AWS: polly - MiniMax: speech-01 Enables automatic model selection for optimal TTS results per provider.

Add handle_generate_audio() function for text-to-speech conversion using LiteLLM's speech() API. Features: - Multi-provider support (OpenAI, Gemini, ElevenLabs, Azure, AWS, etc.) - Auto-detects TTS model based on provider if not specified - Supports voice, speed, and format parameters - Creates Agent Message with kind="Audio" and generated_audio field - Saves audio files to Frappe File Manager - Emits WebSocket events for real-time updates - Returns comprehensive metadata (url, file_id, message_id, voice, format) Follows same pattern as image generation handler for consistency.

Add support for audio messages in Agent Message DocType: - Add "Audio" to kind options - Add generated_audio field (Attach type) for storing generated audio files - Field is conditionally displayed when kind="Audio" Enables storing and displaying audio generation results in chat messages.

Add create_generate_audio_tool() function to register the generate_audio tool in Agent Tool Function DocType. Features: - Creates/updates generate_audio tool with proper parameters - Registers tool type "Audio Generation" if not exists - Defines parameters: input, voice, model, speed, response_format - Idempotent: can be called multiple times safely - Integrated into after_install() and after_migrate() hooks Enables AI agents to use text-to-speech functionality.

esafwan changed the title ~~feat: add LiteLLM-based audio generation (TTS) handler~~ feat: add multi-provider audio generation support Feb 7, 2026

esafwan added 4 commits February 7, 2026 17:37

esafwan force-pushed the feature/audio-generation-tts branch 2 times, most recently from aaf4144 to 408fdfd Compare February 7, 2026 17:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add multi-provider audio generation support#135

feat: add multi-provider audio generation support#135
esafwan wants to merge 4 commits intodevelopfrom
feature/audio-generation-tts

esafwan commented Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

esafwan commented Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant