A simple web application that allows you to record audio and transcribe it using OpenAI's Whisper model from Omi devices.
- Record audio from client devices
- Store audio files on the server
- Transcribe audio files using OpenAI's Whisper model
- View and select from all previously recorded audio files
- On-demand transcription of any stored audio file
- Responsive UI for both desktop and mobile
- Python 3.6+
- FFmpeg (for audio processing)
- OpenAI API key
- Clone this repository
- Install the required Python packages:
pip install -r requirements.txt-
Install FFmpeg if you don't have it already:
- Mac (using Homebrew):
brew install ffmpeg - Ubuntu/Debian:
apt-get install ffmpeg - Windows: Download from FFmpeg website
- Mac (using Homebrew):
-
Create a
.envfile in the project root and add your OpenAI API key:
OPENAI_API_KEY="your_api_key_here"
- Clone this repository
- Create a
.envfile as described above - Build and run the Docker container:
docker build -t omi-audio-transcriber .
docker run -p 5000:5000 -v $(pwd)/saved_audio:/app/saved_audio -v $(pwd)/transcripts:/app/transcripts --env-file .env omi-audio-transcriber- Start the server:
python run_server.py- The server will run at http://localhost:5001
-
To record audio:
- POST audio data to the root endpoint ("/") with parameters
uidandsample_rate - Example:
curl -X POST --data-binary @audio.wav "http://localhost:5001/?uid=user123&sample_rate=44100"
- POST audio data to the root endpoint ("/") with parameters
-
To transcribe existing audio files:
- Navigate to http://localhost:5001/transcribe in your browser (or port 5000 if using Docker)
- Select an audio file from the list
- Click the "Transcribe" button
- View the transcribed text in the right panel
To make the service accessible from other devices, you can use ngrok.
- Install ngrok:
brew install ngrok(or download from ngrok website) - Run ngrok to expose the local port:
ngrok http 5001
- You may need to sign up for a free ngrok account to use this command.
- Use the provided ngrok URL (e.g.,
https://your-ngrok-url.ngrok.io) to access the service from other devices.
See ACCESS.md for instructions on accessing the service from other devices on your network and installing FFmpeg. This file provides guidance on installing FFmpeg for audio processing and accessing the Omi Audio Transcriber from other devices on your local network.
GET /: Health checkPOST /: Upload audioGET /transcript/<filename>: Get a specific transcriptGET /transcribe: Web interface for selecting and transcribing audio filesGET /list_audio_files: API to get a list of all available audio filesGET /transcribe_audio/<filename>: API to transcribe a specific audio file
saved_audio/: Stores all uploaded audio filestranscripts/: Stores all transcription results
- If audio processing doesn't work, make sure FFmpeg is installed and available in your PATH
- If transcription fails, check that your OpenAI API key is valid and set correctly in the .env file