A cross-platform voice-to-text transcription application that allows users to quickly transcribe speech using hotkey activation. The application captures audio from the microphone, transcribes it using speech-to-text technology, and automatically inserts the transcribed text at the cursor position.
- Modern GUI Interface: Professional system tray application with settings management
- Global Hotkey Activation: Press a configurable hotkey to start/stop recording
- Multi-Language Support: 20+ languages with auto-detection or manual selection
- Intelligent Text Substitutions: Automatic correction of technical terms (e.g., "superbase" → "Supabase")
- Dual Transcription Modes: Local Whisper models or OpenAI Whisper API
- Terminal-Compatible Text Insertion: Smart Ctrl+Shift+V insertion that works in terminals
- Live Configuration Reload: Settings changes take effect immediately without restart
- Transcription History: Real-time log of recent transcriptions with timestamps
- Cross-platform: Works on Windows, macOS, and Linux
- Simple Configuration: GUI-based settings with JSON persistence
Option 1: Install from PyPI (Not yet published - use Option 2 for now)
# Install with uv (recommended) - Coming soon!
uv add voicebox
# Or with pip - Coming soon!
pip install voicebox
# Run immediately
voicebox --guiOption 2: Install from Source (Current recommended method)
# Clone and install
git clone <repository-url>
cd voicebox
uv sync
uv run python gui.pyPrerequisites:
- Python 3.8+
- uv package manager
- Microphone access
- For API mode: OpenAI API key
Setup:
# Clone repository
git clone <repository-url>
cd voicebox
# Install with uv (recommended)
make dev
# OR manually: uv sync
# Run from source
make run
# OR manually: uv run python src/main.pyGUI Mode (Recommended):
- From executable:
voicebox --gui(after installation) - From source:
uv run python gui.py
CLI Mode:
- From executable: Just run
voicebox(after installation) - From source:
make runoruv run python src/main.py
Using VoiceBox:
2. Press the hotkey (default: Ctrl+Space) to start recording
3. Speak into your microphone
4. Press the hotkey again to stop recording and transcribe
5. The transcribed text will be automatically inserted at your cursor
Configuration is stored in a JSON file in your system's config directory:
- Windows:
%APPDATA%\VoiceBox\config.json - macOS:
~/Library/Application Support/VoiceBox/config.json - Linux:
~/.config/VoiceBox/config.json
{
"transcription_mode": "local",
"hotkey": "ctrl+space",
"api_key": "",
"local_model_size": "base",
"transcription_language": "auto"
}-
local: Uses faster-whisper for offline transcription
- Models:
tiny,base,small,medium,large-v2,large-v3 - No internet required after initial model download
- Models:
-
api: Uses OpenAI Whisper API
- Requires internet connection and API key
- Generally faster and more accurate
VoiceBox supports 20+ languages for improved accuracy:
- auto: Auto-detect language (default)
- en: English, es: Spanish, fr: French, de: German
- ja: Japanese, zh: Chinese, ru: Russian, ar: Arabic
- And many more available in the GUI settings
Specifying the correct language significantly improves transcription accuracy for technical terms.
Common hotkey combinations:
ctrl+space(default)ctrl+shift+vctrl+alt+vbutton9(mouse side button)
- Get an OpenAI API key from https://platform.openai.com/api-keys
- Set the API key in config:
{
"transcription_mode": "api",
"api_key": "your-api-key-here"
}VoiceBox includes intelligent text correction for commonly misheard technical terms:
Built-in corrections include:
superbase→Supabaseversel→Vercelget hub→GitHubjava script→JavaScriptmongo db→MongoDBa p i→API- And 70+ more technical terms
Managing substitutions:
- Open Settings → Substitutions tab
- Add custom corrections for your specific terminology
- Import/export substitution lists
- Changes take effect immediately (no restart needed)
System Tray Integration:
- Minimizes to system tray for background operation
- Right-click menu for quick access
- Visual status indicators
Settings Management:
- Tabbed interface for organized configuration
- Real-time validation and feedback
- Import/export functionality
Transcription History:
- Real-time log of all transcriptions
- Timestamp tracking
- Auto-scrolling display
python main.py --help # Show help
python main.py --gui # Run with GUI (system tray)
python main.py --test # Test initialization
python main.py --config # Show config file path- Check microphone permissions
- Try listing audio devices: modify the code to call
AudioRecorder.list_audio_devices()
- Check for conflicting applications using the same hotkey
- Try a different hotkey combination
- On Linux, you may need to run with appropriate permissions
- Local mode: First run downloads the model (may take time)
- API mode: Check API key and internet connection
- Check console output for detailed error messages
- Ensure the target application has focus
- Try different text insertion methods in config
- On macOS, you may need to grant accessibility permissions
src/
├── main.py # Application entry point
├── audio/
│ └── capture.py # Audio recording
├── transcription/
│ ├── base.py # Service interface
│ ├── local.py # Local Whisper with language support
│ └── api.py # OpenAI API with language support
├── system/
│ ├── hotkeys.py # Global hotkey handling
│ └── text_insertion.py # Text insertion with terminal support
├── text/
│ └── substitutions.py # Intelligent text corrections
├── ui/
│ └── gui.py # PyQt6 GUI interface
└── config/
└── manager.py # Configuration management
Each component can be tested independently:
# Test audio recording
from audio.capture import AudioRecorder
recorder = AudioRecorder()
recorder.start_recording()
# ... wait ...
audio_file = recorder.stop_recording()
# Test transcription
from transcription.local import LocalWhisperService
service = LocalWhisperService()
text = service.transcribe(audio_file)
# Test text insertion
from system.text_insertion import TextInserter
inserter = TextInserter()
inserter.insert_text("Hello world")
# Test text substitutions
from text.substitutions import SubstitutionManager
sub_manager = SubstitutionManager()
corrected = sub_manager.apply_substitutions("I'm using superbase with versel")
print(corrected) # "I'm using Supabase with Vercel"Simple build:
make build
# OR: python build.pyCross-platform distribution:
make dist # Creates .tar.gz
./scripts/build-all.sh # Multi-platform (requires Docker)PyPI Package:
# Build package for PyPI
uv run python -m build
# Publish to PyPI (see PUBLISHING.md for details)
uv run twine upload dist/*Manual PyInstaller:
uv run pyinstaller build_config/voicebox.specExecutables appear in dist/ directory with installers included.
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly on your platform
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- faster-whisper for efficient local transcription
- OpenAI Whisper for the underlying speech recognition technology
- pynput for cross-platform input handling