Skip to content
/ dictr Public

Push-to-talk voice dictation for Linux / X11

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT
Notifications You must be signed in to change notification settings

mwmdev/dictr

Repository files navigation

dictr

CI Crates.io License

Push-to-talk voice dictation for Linux.

Single binary - Private - Fast - Customizable

Features

  • Push-to-talk — hold a hotkey to record, release to transcribe and paste
  • Local inference — runs Whisper locally, your audio never leaves your machine
  • CUDA GPU acceleration — optional NVIDIA GPU support for sub-second transcription
  • OpenAI API fallback — use the OpenAI Whisper API as an alternative backend
  • Text replacements — custom post-processing rules for text replacement

Usage

dictr                          # Default: AltGr hotkey, local whisper, xdotool type
dictr --hotkey F9              # Use F9 instead of AltGr
dictr --backend api            # Use OpenAI Whisper API (requires OPENAI_API_KEY)
dictr --api-url http://...     # Custom API endpoint
dictr --model /path/to/model   # Specific model file
dictr --paste                  # Use clipboard paste (better for accents/Unicode)
dictr --device AT2020          # Select mic by name substring
dictr --list-devices           # List available input devices
dictr --language fr            # Transcribe in French
dictr --initial-prompt '...'   # Guide transcription with context
dictr --min-duration 500       # Min recording duration in ms (default: 300)
dictr --verbose                # Debug output

Install

Interactive installer

curl -fsSL https://raw.githubusercontent.com/mwmdev/dictr/main/install.sh | sh

Cargo

cargo install dictr

Then download a Whisper model to ~/.local/share/dictr/models/.

Build from source

Requires Linux with X11, xdotool, xclip, ALSA or PipeWire, plus build deps: cmake, clang, pkg-config, libasound2-dev, libx11-dev, libxi-dev, libxtst-dev, libxrandr-dev, libssl-dev. For CUDA: NVIDIA CUDA toolkit.

cargo build --release                  # CPU only
cargo build --release --features cuda  # With GPU

On NixOS, use nix-shell --run "cargo build --release"

Configuration

~/.config/dictr/config.toml:

hotkey = "AltGr"                 # Supported hotkeys: AltGr, Alt, Ctrl, RCtrl, Shift, RShift, Super, CapsLock, Space, Escape, F1-F12
backend = "local"                # "local" or "api"
model_path = "~/.local/share/dictr/models/ggml-base.bin"
api_key = ""                     # or set OPENAI_API_KEY env var
api_url = "https://api.openai.com/v1/audio/transcriptions"
typing_delay_ms = 2
min_duration_ms = 300
device = "AT2020USB+"
language = "en"
initial_prompt = "commit, readme, build, test, deploy, refactor" # Guide transcription with context (e.g. expected words, domain-specific terms)

[replacements]
"slash " = "/"
"new line" = "\n"

Text replacements

The [replacements] table performs substitution on transcription output. Useful for special cases like "slash" → "/" or "new line" → "\n". Keys are replaced with their corresponding values in the final transcribed text.

License

Licensed under either of MIT or Apache-2.0 at your option.