Add speech-to-text, text-to-speech, and ElevenLabs provider by patrickdet · Pull Request #472 · agentjido/req_llm

patrickdet · 2026-03-01T21:06:20Z

Summary

Adds TTS and STT to req_llm, plus an ElevenLabs provider.

Speech (`ReqLLM.Speech`)

Text-to-speech through the standard prepare_request(:speech, ...) pipeline. Any provider that implements the operation works.

{:ok, result} = ReqLLM.speak("openai:tts-1", "Hello world", voice: "alloy")
File.write!("hello.mp3", result.audio)

Options: voice selection, speed, output format (mp3/wav/opus/flac/aac/pcm), language hints, provider-specific stuff like OpenAI's instructions for gpt-4o-mini-tts.

Transcription (`ReqLLM.Transcription`)

Speech-to-text. Takes file paths, raw binary, or base64 audio. Returns text with optional segment timing.

{:ok, result} = ReqLLM.transcribe("groq:whisper-large-v3-turbo", "recording.mp3")
result.text #=> "Hello world"
result.segments #=> [%{text: "Hello world", start_second: 0.0, end_second: 1.2}]

ElevenLabs provider

Speech-only. Their API is pretty different from OpenAI's /audio/speech:

Voice ID in the URL path (/v1/text-to-speech/{voiceId})
xi-api-key header instead of Bearer auth
Output format as a query param, not body field
text/model_id instead of input/model

voice_settings (stability, similarity_boost, style, speed) go through provider_options. Auto-discovered at startup.

{:ok, result} = ReqLLM.speak(
  %{id: "eleven_multilingual_v2", provider: :elevenlabs},
  "Hello!",
  provider_options: [stability: 0.5, similarity_boost: 0.8]
)

Integration tests

Tagged :integration, excluded by default. Tested against real APIs:

ElevenLabs TTS: default voice, voice_settings, language codes
OpenAI TTS: tts-1, wav output
Groq STT: generate-then-transcribe pattern (OpenAI TTS makes audio, Groq whisper transcribes it) so we don't commit binary fixtures

ELEVENLABS_API_KEY=... OPENAI_API_KEY=... GROQ_API_KEY=... \
  mix test --include integration test/req_llm/integration/

Test plan

ElevenLabs unit tests pass (18 tests)
Full suite passes (2370 tests, 0 failures)
Integration tests pass against real ElevenLabs, OpenAI, and Groq APIs (8/8)

Add speech-to-text transcription (ReqLLM.Transcription) and text-to-speech generation (ReqLLM.Speech) with provider-agnostic pipelines that work via prepare_request(:transcription/:speech). Add ElevenLabs as a speech-only provider with its unique API format (voice ID in URL path, xi-api-key header, format as query param). Integration tests verify TTS (ElevenLabs + OpenAI) and STT (Groq whisper via generate-then-transcribe pattern) against real APIs. Tagged :integration and excluded by default.

patrickdet force-pushed the feat/speech-transcription-elevenlabs branch from 8e6ea10 to 3d7830c Compare March 1, 2026 21:11

patrickdet force-pushed the feat/speech-transcription-elevenlabs branch from 3d7830c to 9ad9d93 Compare March 1, 2026 21:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add speech-to-text, text-to-speech, and ElevenLabs provider#472

Add speech-to-text, text-to-speech, and ElevenLabs provider#472
patrickdet wants to merge 1 commit intoagentjido:mainfrom
patrickdet:feat/speech-transcription-elevenlabs

patrickdet commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

patrickdet commented Mar 1, 2026

Summary

Speech (ReqLLM.Speech)

Transcription (ReqLLM.Transcription)

ElevenLabs provider

Integration tests

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Speech (`ReqLLM.Speech`)

Transcription (`ReqLLM.Transcription`)