Skip to content
/ vox Public

A universal AI toolkit for high-performance Speech-to-Text (STT) and Text-to-Speech (TTS) processing, designed for low-latency and easy model integration.

License

Notifications You must be signed in to change notification settings

rtk-ai/vox

Repository files navigation

vox

Cross-platform TTS CLI — local voice synthesis with three backends.

                         vox
                          |
            +-------------+-------------+
            |             |             |
          say          qwen        qwen-native
       (macOS)     (MLX/Python)    (pure Rust)
        native      Apple Silicon   cross-platform
                                   CPU/Metal/CUDA
                          |
                        rodio
                    (audio playback)

Install

# From source
cargo install --path .

# Quick install (macOS / Linux / WSL)
curl -fsSL https://raw.githubusercontent.com/rtk-ai/vox/main/install.sh | sh
Platform Default backend GPU
macOS say --features metal
Linux / WSL qwen-native --features cuda

Linux requires sudo apt install libasound2-dev.

Usage with Claude Code

vox init                # MCP server (default)
vox init -m cli         # CLAUDE.md + Stop hook
vox init -m skill       # /speak slash command
vox init -m all         # all of the above

Each mode sets up a different integration:

Mode What it does
mcp Registers vox serve as an MCP server in ~/.claude.json (Claude Code) and Claude Desktop config. Exposes 8 tools: vox_speak, vox_list_voices, vox_clone_*, vox_config_*, vox_stats.
cli Creates a CLAUDE.md in your project with instructions for Claude to call vox after significant tasks. Adds a Stop hook in .claude/settings.json that says "Terminé" after each response.
skill Creates a /speak slash command in ~/.claude/commands/speak.md.
all Runs all three modes (default).
  Claude Code
      |
   MCP stdio
      |
  vox serve ──> vox_speak, vox_list_voices, ...

Running vox init again is safe — it skips files that are already configured.

Standalone CLI

vox "Hello, world."
vox -b qwen-native "Cross-platform TTS."
echo "Hello" | vox
vox --list-voices

Voice cloning

vox clone add patrick --audio ~/voice.wav --text "Transcription"
vox clone record myvoice --duration 10
vox -v patrick "This speaks with your voice."
vox clone list
vox clone remove patrick

Preferences

vox config show
vox config set backend qwen
vox config set lang fr
vox config set voice Chelsie
vox config reset

Optional: Qwen backend (macOS)

Neural TTS via Python/MLX on Apple Silicon:

uv pip install mlx-audio

Model downloaded automatically on first use (~1.2 GB).

Data

All state is stored locally in ~/.config/vox/:

~/.config/vox/
├── vox.db          # SQLite: preferences, voice clones, usage logs
└── clones/         # audio files for voice clones
Env var Description
VOX_CONFIG_DIR Override config directory
VOX_DB_PATH Override database path

License

MIT

About

A universal AI toolkit for high-performance Speech-to-Text (STT) and Text-to-Speech (TTS) processing, designed for low-latency and easy model integration.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •