This project is a voice assistant that listens to user speech, transcribes it, and generates AI responses using the Hugging Face API. The assistant can speak the AI responses back to the user but like OpenAI's advanced voice mode, you can interupt the assistant by just speaking while it speaks.
The system uses huggingface's inference API to use specific models (which you can customize) and it works well on the free tier, consuming <1¢ developing it. (NousResearch/Hermes-3-Llama-3.1-8B)
The system also uses Vosk for the speech to text conversion. You can setup your own, completely free huggingface space to do this with almost no latency.
- Natural interruption by speaking over the assistant.
- Real-time speech recognition using Vosk. (learn how to setup a Vosk API server for free)
- AI response generation using Hugging Face's Inference API. (You can customize it to use a provider of your choice)
- Text-to-speech conversion using Edge TTS, streaming the response to ensure ultra fast feedback for the natural conversation flow.
- A Vosk API server
- A huggingface token
- Python 3.7+
- Required Python packages:
playsoundpyaudiowavepydubedge_ttssounddevicewebsocketshuggingface_hub
-
Clone the repository:
git clone https://github.com/DoctorDemon/advanced-voice.git cd advanced-voice -
Install the required packages:
pip install playsound pyaudio wave pydub edge_tts sounddevice websockets huggingface_hub
-
Obtain an API key from Hugging Face and replace
{YOUR HUGGINGFACE API TOKEN}inassistant.pywith your actual API key.
-
Run the assistant:
python assistant.py
-
The assistant will start listening for your speech. Speak into your microphone, and the assistant will transcribe your speech, generate an AI response, and speak the response back to you.
You can interrupt it to test the interruption feature, the assistant should stop.
assistant.py: Main script for the voice assistant.speaker.py: Contains functions for text-to-speech conversion streaming from edge TTS (free) and playback control.
- Modify the
conversation_historyinassistant.pyto change the assistant's personality and style. - Adjust the
CHUNK_SIZEinspeaker.pyto change the audio chunk size for playback.
-
Ensure your microphone is properly configured and working. you can change the input device the program is using by changing the device ID in the RawInputStream initialisation like this:
with sd.RawInputStream(samplerate=16000, blocksize=8000, dtype='int16', channels=1, callback=callback, device=1)
where device ID is your systems audio device ID.
-
Check your internet connection as the assistant relies on online services.
-
If you encounter any issues, refer to the error messages for debugging.
-
If there are any more problems, feel free to open an issue!
Built by developers, for developers! ❤️
