GitHub - DoctorDemon/advanced-voice: A simple implementation of OpenAIs advanced voice mode, featuring interruptions and natural answers, completely for free.

This project is a voice assistant that listens to user speech, transcribes it, and generates AI responses using the Hugging Face API. The assistant can speak the AI responses back to the user but like OpenAI's advanced voice mode, you can interupt the assistant by just speaking while it speaks.
The system uses huggingface's inference API to use specific models (which you can customize) and it works well on the free tier, consuming <1¢ developing it. (NousResearch/Hermes-3-Llama-3.1-8B)
The system also uses Vosk for the speech to text conversion. You can setup your own, completely free huggingface space to do this with almost no latency.

Features

Natural interruption by speaking over the assistant.
Real-time speech recognition using Vosk. (learn how to setup a Vosk API server for free)
AI response generation using Hugging Face's Inference API. (You can customize it to use a provider of your choice)
Text-to-speech conversion using Edge TTS, streaming the response to ensure ultra fast feedback for the natural conversation flow.

Requirements

A Vosk API server
A huggingface token
Python 3.7+
Required Python packages:
- playsound
- pyaudio
- wave
- pydub
- edge_tts
- sounddevice
- websockets
- huggingface_hub

Installation

Clone the repository:

git clone https://github.com/DoctorDemon/advanced-voice.git
cd advanced-voice

Install the required packages:

pip install playsound pyaudio wave pydub edge_tts sounddevice websockets huggingface_hub

Obtain an API key from Hugging Face and replace {YOUR HUGGINGFACE API TOKEN} in assistant.py with your actual API key.

Usage

Run the assistant:
```
python assistant.py
```
The assistant will start listening for your speech. Speak into your microphone, and the assistant will transcribe your speech, generate an AI response, and speak the response back to you.
You can interrupt it to test the interruption feature, the assistant should stop.

Files

assistant.py: Main script for the voice assistant.
speaker.py: Contains functions for text-to-speech conversion streaming from edge TTS (free) and playback control.

Customization

Modify the conversation_history in assistant.py to change the assistant's personality and style.
Adjust the CHUNK_SIZE in speaker.py to change the audio chunk size for playback.

Troubleshooting

Ensure your microphone is properly configured and working. you can change the input device the program is using by changing the device ID in the RawInputStream initialisation like this:
```
  with sd.RawInputStream(samplerate=16000, blocksize=8000, dtype='int16',
                         channels=1, callback=callback, device=1)
```
where device ID is your systems audio device ID.
Check your internet connection as the assistant relies on online services.
If you encounter any issues, refer to the error messages for debugging.
If there are any more problems, feel free to open an issue!

You may use these code snippets freely in your projects!

Built by developers, for developers! ❤️

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
assistant.py		assistant.py
speaker.py		speaker.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

Requirements

Installation

Usage

Files

Customization

Troubleshooting

You may use these code snippets freely in your projects!

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

DoctorDemon/advanced-voice

Folders and files

Latest commit

History

Repository files navigation

Features

Requirements

Installation

Usage

Files

Customization

Troubleshooting

You may use these code snippets freely in your projects!

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages