A Discord bot that listens to voice chat, transcribes speech using Whisper, generates responses using Ollama, and speaks back using ElevenLabs or Bark (soon) TTS.
- Real-time voice transcription using OpenAI's Whisper
- AI-powered responses using Ollama (local LLM)
- Text-to-speech responses using ElevenLabs (cloud) or Bark (local, soon)
- Automatic audio cleanup and management
- Configurable logging system
- Currently not group chat friendly
- Not scalable to more servers at once
- No local tts model option
- Python 3.8+
- FFmpeg (for audio processing)
- Ollama installed and running locally
- Discord Bot Token
- ElevenLabs API Key and Voice ID
- CUDA-compatible GPU recommended (for faster transcription)
- Clone the repository
- Install the required Python packages:
pip install -r requirements.txt- Copy
config.example.jsontoconfig.jsonand fill in your credentials:
{
"discord_token": "YOUR_DISCORD_BOT_TOKEN",
"elevenlabs_api_key": "YOUR_ELEVENLABS_API_KEY",
"voice_id": "YOUR_ELEVENLABS_VOICE_ID",
"ollama_host": "http://localhost:11434",
"ollama_model": "llama3.1:latest",
"cleanup_responses": false
}- Go to the Discord Developer Portal
- Create a new application
- Add a bot to your application
- Enable Voice State and Message Intent permissions
- Copy the bot token to your
config.json
- Create an account at ElevenLabs
- Get your API key from the profile settings
- Choose a voice and copy its ID
- Add both to your
config.json
- Install Ollama from ollama.ai
- Pull your preferred model:
ollama pull mistral!join- Bot joins your current voice channel!leave- Bot leaves the voice channel
- Start the Ollama service
- Run the bot:
python main.py- Invite the bot to your Discord server
- Join a voice channel
- Use
!jointo make the bot join - The bot will:
- Listen to voice chat
- Transcribe speech in real-time
- Generate responses using Ollama
- Speak responses using
ElevenLabs TTSor a localbark ttsmodel
bot/- Main bot moduleservices/- Core services (audio, TTS, LLM)voice/- Voice processing componentsutils/- Utility functions
temp/- Temporary audio filesresponses/- Generated audio responseslogs/- Application logs
Logs are stored in logs/bot.log with automatic rotation at 10MB and keeping 5 backups.
If you utilize this repository, data in a downstream project, please consider citing it with:
@misc{discospeech,
author = {AJR},
title = {DiscoSpeech: Realistic discord voice chat AI},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/ajr-dev/discospeech}},
DiscoSpeech couldn't have been built without the help of great software already available. Thank you!
This is a community project, a special thanks to our contributors! 🤗