DiscoSpeech

A Discord bot that listens to voice chat, transcribes speech using Whisper, generates responses using Ollama, and speaks back using ElevenLabs or Bark (soon) TTS.

🚀 Features

Real-time voice transcription using OpenAI's Whisper
AI-powered responses using Ollama (local LLM)
Text-to-speech responses using ElevenLabs (cloud) or Bark (local, soon)
Automatic audio cleanup and management
Configurable logging system

Limits

Currently not group chat friendly
Not scalable to more servers at once
No local tts model option

Prerequisites

Python 3.8+
FFmpeg (for audio processing)
Ollama installed and running locally
Discord Bot Token
ElevenLabs API Key and Voice ID
CUDA-compatible GPU recommended (for faster transcription)

Installation

Clone the repository
Install the required Python packages:

pip install -r requirements.txt

Copy config.example.json to config.json and fill in your credentials:

{
    "discord_token": "YOUR_DISCORD_BOT_TOKEN",
    "elevenlabs_api_key": "YOUR_ELEVENLABS_API_KEY",
    "voice_id": "YOUR_ELEVENLABS_VOICE_ID",
    "ollama_host": "http://localhost:11434",
    "ollama_model": "llama3.1:latest",
    "cleanup_responses": false
}

Configuration

Discord Bot Setup

Go to the Discord Developer Portal
Create a new application
Add a bot to your application
Enable Voice State and Message Intent permissions
Copy the bot token to your config.json

ElevenLabs Setup

Create an account at ElevenLabs
Get your API key from the profile settings
Choose a voice and copy its ID
Add both to your config.json

Ollama Setup

Install Ollama from ollama.ai
Pull your preferred model:

ollama pull mistral

Commands

!join - Bot joins your current voice channel
!leave - Bot leaves the voice channel

💻 Usage

Start the Ollama service
Run the bot:

python main.py

Invite the bot to your Discord server
Join a voice channel
Use !join to make the bot join
The bot will:
- Listen to voice chat
- Transcribe speech in real-time
- Generate responses using Ollama
- Speak responses using ElevenLabs TTS or a local bark tts model

Project Structure

bot/ - Main bot module
- services/ - Core services (audio, TTS, LLM)
- voice/ - Voice processing components
- utils/ - Utility functions
temp/ - Temporary audio files
responses/ - Generated audio responses
logs/ - Application logs

Logging

Logs are stored in logs/bot.log with automatic rotation at 10MB and keeping 5 backups.

Citation

If you utilize this repository, data in a downstream project, please consider citing it with:

@misc{discospeech,
  author = {AJR},
  title = {DiscoSpeech: Realistic discord voice chat AI},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/ajr-dev/discospeech}},

🌟 Star history

License

MIT License

🙇 Acknowledgements

DiscoSpeech couldn't have been built without the help of great software already available. Thank you!

🤗 Contributors

This is a community project, a special thanks to our contributors! 🤗

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
bot		bot
.gitignore		.gitignore
LICENSE		LICENSE
config.example.json		config.example.json
main.py		main.py
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DiscoSpeech

🚀 Features

Limits

Prerequisites

Installation

Configuration

Discord Bot Setup

ElevenLabs Setup

Ollama Setup

Commands

💻 Usage

Project Structure

Logging

Citation

🌟 Star history

License

🙇 Acknowledgements

🤗 Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

License

ajr-dev/discospeech

Folders and files

Latest commit

History

Repository files navigation

DiscoSpeech

🚀 Features

Limits

Prerequisites

Installation

Configuration

Discord Bot Setup

ElevenLabs Setup

Ollama Setup

Commands

💻 Usage

Project Structure

Logging

Citation

🌟 Star history

License

🙇 Acknowledgements

🤗 Contributors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages