Cross-Platform Per-Process Audio Capture
ProcTap is a Python library for per-process audio capture with platform-specific backends.
Capture audio from a specific process only โ without system sounds or other app audio mixed in. Ideal for VRChat, games, DAWs, browsers, and AI audio analysis pipelines.
| Platform | Status | Backend | Notes |
|---|---|---|---|
| Windows | โ Fully Supported | WASAPI (C++ native) | Windows 10/11 (20H1+) |
| Linux | โ Fully Supported | PipeWire Native / PulseAudio | Per-process isolation, auto-fallback (v0.3.0+) |
| macOS | โ Officially Supported | ScreenCaptureKit | macOS 13+ (Ventura), bundleID-based (v0.4.0+) |
* Linux is fully supported with PipeWire/PulseAudio (v0.3.0+). macOS is officially supported with ScreenCaptureKit (v0.4.0+).
-
๐ง Capture audio from a single target process (VRChat, games, browsers, Discord, DAWs, streaming tools, etc.)
-
๐ Cross-platform architecture โ Windows (fully supported) | Linux (fully supported, v0.3.0+) | macOS (officially supported, v0.4.0+)
-
โก Platform-optimized backends โ Windows: ActivateAudioInterfaceAsync (modern WASAPI) โ Linux: PipeWire Native API / PulseAudio (fully supported, v0.3.0+) โ macOS: ScreenCaptureKit API (macOS 13+, bundleID-based, v0.4.0+)
-
๐งต Low-latency, thread-safe audio engine โ 48 kHz / stereo / float32 format (Windows)
-
๐ Python-friendly high-level API
- Callback-based streaming
- Async generator streaming (
async for)
-
๐ Native extensions for high-performance โ C++ extension on Windows for optimal throughput
From PyPI:
pip install proc-tapPlatform-specific dependencies are automatically installed:
- Windows: No additional dependencies
- Linux:
pulsectlis automatically installed, but you also need system packages:# Ubuntu/Debian sudo apt-get install pulseaudio-utils # Fedora/RHEL sudo dnf install pulseaudio-utils
Optional: High-Quality Audio Resampling (74% faster / 3.8x speedup for sample rate conversion):
pip install proc-tap[hq-resample]Performance: With libsamplerate, resampling achieves 0.66ms per 10ms chunk (vs 2.6ms with scipy-only).
Compatibility Notes:
- โ Python 3.10-3.12: Works on all platforms
- โ Linux/macOS + Python 3.13+: Should work (you can try it!)
โ ๏ธ Windows + Python 3.13+: May fail to build (as of 2025-01)- If it fails, the library automatically falls back to scipy's polyphase filtering
- Still provides excellent audio quality, just 74% slower for resampling
- You can still try installing - if it works, great! If not, no harm done.
๐ Read the Full Documentation for detailed guides and API reference.
From TestPyPI (for testing pre-releases):
pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ proctapFrom Source:
git clone https://github.com/m96-chan/ProcTap
cd ProcTap
pip install -e .ProcTap includes a CLI for piping audio directly to FFmpeg or other tools:
# Pipe to FFmpeg (MP3 encoding) - Direct command
proctap --pid 12345 --stdout | ffmpeg -f s16le -ar 48000 -ac 2 -i pipe:0 output.mp3
# Or using python -m
python -m proctap --pid 12345 --stdout | ffmpeg -f s16le -ar 48000 -ac 2 -i pipe:0 output.mp3
# Using process name instead of PID
proctap --name "VRChat.exe" --stdout | ffmpeg -f s16le -ar 48000 -ac 2 -i pipe:0 output.mp3
# FLAC encoding (lossless)
proctap --pid 12345 --stdout | ffmpeg -f s16le -ar 48000 -ac 2 -i pipe:0 output.flac
# Native float32 output (no conversion)
proctap --pid 12345 --format float32 --stdout | ffmpeg -f f32le -ar 48000 -ac 2 -i pipe:0 output.mp3CLI Options:
| Option | Description |
|---|---|
--pid PID |
Process ID to capture (required if --name not used) |
--name NAME |
Process name to capture (e.g., VRChat.exe or VRChat) |
--stdout |
Output raw PCM to stdout for piping (required) |
--format {int16,float32} |
Output format: int16 or float32 (default: int16) |
--verbose |
Enable verbose logging to stderr |
--list-audio-procs |
List all processes currently playing audio |
Finding Process IDs:
# Windows
tasklist | findstr "VRChat"
# Linux/macOS
ps aux | grep VRChatFFmpeg Format Arguments:
The CLI outputs raw PCM at 48kHz stereo. FFmpeg needs these arguments based on --format:
int16 (default):
-f s16le: Signed 16-bit little-endian PCM-ar 48000: Sample rate (48kHz, fixed)-ac 2: Channels (stereo, fixed)-i pipe:0: Read from stdin
float32:
-f f32le: 32-bit float little-endian PCM-ar 48000: Sample rate (48kHz, fixed)-ac 2: Channels (stereo, fixed)-i pipe:0: Read from stdin
Windows (Fully Supported):
- Windows 10 / 11 (20H1 or later)
- Python 3.10+
- WASAPI support
- No admin privileges required
Linux (Fully Supported - v0.3.0+):
- Linux with PulseAudio or PipeWire
- Python 3.10+
- Auto-detection: Automatically selects best available backend
- Native PipeWire API (in development, experimental):
libpipewire-0.3-dev:sudo apt-get install libpipewire-0.3-dev- Target latency: ~2-5ms (when fully implemented)
- Auto-selected when available (may fall back to subprocess)
- PipeWire subprocess:
pw-record: install withsudo apt-get install pipewire-media-session
- PulseAudio fallback:
pulsectllibrary: automatically installedpareccommand:sudo apt-get install pulseaudio-utils
- โ Per-process isolation using null-sink strategy
- โ Graceful fallback chain: Native โ PipeWire subprocess โ PulseAudio
macOS (Officially Supported - v0.4.0+):
- macOS 13.0 (Ventura) or later (macOS 13+ recommended)
- Python 3.10+
- Swift helper binary (screencapture-audio)
- Screen Recording permission (automatically prompted)
- โ ScreenCaptureKit Backend: Apple Silicon compatible, no AMFI/SIP hacks needed
- โ Simple Permissions: Screen Recording only (no Microphone/TCC hacks)
- โ Low Latency: ~10-15ms audio capture
from proctap import ProcTap, StreamConfig
def on_chunk(pcm: bytes, frames: int):
print(f"Received {len(pcm)} bytes ({frames} frames)")
pid = 12345 # Target process ID
tap = ProcTap(pid, StreamConfig(), on_data=on_chunk)
tap.start()
input("Recording... Press Enter to stop.\n")
tap.close()import asyncio
from proctap import ProcTap
async def main():
tap = ProcTap(pid=12345)
tap.start()
async for chunk in tap.iter_chunks():
print(f"PCM chunk size: {len(chunk)} bytes")
asyncio.run(main())Control Methods:
| Method | Description |
|---|---|
start() |
Start WASAPI per-process capture |
stop() |
Stop capture |
close() |
Release native resources |
Data Access:
| Method | Description |
|---|---|
iter_chunks() |
Async generator yielding PCM chunks |
read(timeout=1.0) |
Synchronous: read one chunk (blocking) |
Properties:
| Property | Type | Description |
|---|---|---|
is_running |
bool | Check if capture is active |
pid |
int | Get target process ID |
config |
StreamConfig | Get stream configuration |
Utility Methods:
| Method | Description |
|---|---|
set_callback(callback) |
Change or remove audio callback |
get_format() |
Get audio format info (dict) |
Windows Backend Format (WASAPI, returned to Python):
| Parameter | Value | Description |
|---|---|---|
| Sample Rate | 48,000 Hz | Professional audio quality |
| Channels | 2 | Stereo |
| Format | float32 | IEEE 754 floating point (-1.0 to +1.0) |
| Fallback | 44.1kHz int16 | Auto-converted to 48kHz float32 if float32 init fails |
Important Note: For WAV file output, you must convert float32 to int16:
import numpy as np
def on_data(pcm: bytes, frames: int):
# Convert float32 to int16 for WAV files
float_samples = np.frombuffer(pcm, dtype=np.float32)
int16_samples = (np.clip(float_samples, -1.0, 1.0) * 32767).astype(np.int16)
wav.writeframes(int16_samples.tobytes())- ๐ฎ Record audio from one game only
- ๐ถ Capture VRChat audio cleanly (without system sounds)
- ๐ Feed high-SNR audio into AI recognition models
- ๐น Alternative to OBS "Application Audio Capture"
- ๐ง Capture DAW/app playback for analysis tools
ProcTap includes optional contrib modules for advanced audio processing:
Monitor and analyze audio from processes in real-time with spectrum analysis, volume meters, and frequency visualization.
CLI Mode (Terminal-based):
# Analyze by process ID
python -m proctap.contrib.analysis --pid 12345
# Analyze by process name
python -m proctap.contrib.analysis --name "VRChat.exe"GUI Mode (Matplotlib window):
# Launch GUI visualizer
python -m proctap.contrib.analysis --pid 12345 --gui
# Adjust FFT size for better frequency resolution
python -m proctap.contrib.analysis --pid 12345 --gui --fft-size 4096Features:
- ๐ Real-time spectrum analyzer (FFT-based frequency analysis)
- ๐ Volume meters (RMS and peak levels in dB)
- ๐ต Frequency band analysis (Sub, Bass, Mid, Treble, Presence, Brilliance)
- ๐ป Terminal visualization (CLI mode) or ๐ Matplotlib plots (GUI mode)
- โ๏ธ Configurable FFT size (512, 1024, 2048, 4096, 8192)
Programmatic Usage:
from proctap import ProcessAudioCapture
from proctap.contrib import AudioAnalyzer, CLIVisualizer
# Create analyzer
analyzer = AudioAnalyzer(sample_rate=48000, fft_size=2048)
# Create callback for audio processing
def on_audio(pcm: bytes, frames: int):
analyzer.process_audio(pcm)
# Start audio capture with callback
tap = ProcessAudioCapture(pid=12345, on_data=on_audio)
tap.start()
# Create and run visualizer
visualizer = CLIVisualizer(analyzer)
visualizer.start() # Blocking - displays in terminalOptional Dependencies:
- CLI mode: Included (uses numpy/scipy)
- GUI mode: Requires
matplotlib(pip install matplotlib)
from proctap import ProcTap
import wave
pid = 12345
wav = wave.open("output.wav", "wb")
wav.setnchannels(2)
wav.setsampwidth(2) # 16-bit PCM
wav.setframerate(44100) # Native format is 44.1 kHz
def on_data(pcm, frames):
wav.writeframes(pcm)
with ProcTap(pid, on_data=on_data):
input("Recording... Press Enter to stop.\n")
wav.close()from proctap import ProcTap
tap = ProcTap(pid=12345)
tap.start()
try:
while True:
chunk = tap.read(timeout=1.0) # Blocking read
if chunk:
print(f"Got {len(chunk)} bytes")
# Process audio data...
else:
print("Timeout, no data")
except KeyboardInterrupt:
pass
finally:
tap.close()from proctap import ProcessAudioCapture, StreamConfig
import wave
pid = 12345 # Your target process ID
# Create WAV file
wav = wave.open("linux_capture.wav", "wb")
wav.setnchannels(2)
wav.setsampwidth(2)
wav.setframerate(44100)
def on_data(pcm, frames):
wav.writeframes(pcm)
# Create stream config (Linux backend respects these settings)
config = StreamConfig(sample_rate=44100, channels=2)
try:
with ProcessAudioCapture(pid, config=config, on_data=on_data):
print("โ ๏ธ Make sure the process is actively playing audio!")
input("Recording... Press Enter to stop.\n")
finally:
wav.close()Linux-specific requirements:
- Install system package:
sudo apt-get install pulseaudio-utils(providespareccommand) - Python dependency
pulsectlis automatically installed withpip install proc-tap - The target process must be actively playing audio
- See examples/linux_basic.py for a complete example
from proctap import ProcessAudioCapture, StreamConfig
import wave
pid = 12345 # Your target process ID
# Create WAV file
wav = wave.open("macos_capture.wav", "wb")
wav.setnchannels(2)
wav.setsampwidth(2)
wav.setframerate(48000) # macOS backend default is 48 kHz
def on_data(pcm, frames):
wav.writeframes(pcm)
# Create stream config (macOS backend respects these settings)
config = StreamConfig(sample_rate=48000, channels=2)
try:
with ProcessAudioCapture(pid, config=config, on_data=on_data):
print("โ ๏ธ Make sure the process is actively playing audio!")
print("โ ๏ธ On first run, macOS will prompt for Screen Recording permission.")
input("Recording... Press Enter to stop.\n")
finally:
wav.close()macOS-specific requirements (v0.4.0+):
- macOS 13.0 (Ventura) or later
- Swift helper binary (screencapture-audio) - automatically built during installation
- Screen Recording permission - macOS will prompt on first run
- The target process must be actively playing audio
- Works with bundleID-based capture (PID is automatically converted to bundleID)
- See examples/macos_screencapture_test.py for a complete example
Building the Swift helper manually:
cd src/proctap/swift/screencapture-audio
swift build -c releaseNote: The ScreenCaptureKit backend (v0.4.0+) is recommended over the experimental PyObjC/C extension backends.
git clone https://github.com/m96-chan/ProcTap
cd ProcTap
pip install -e .Windows Build Requirements:
- Visual Studio Build Tools
- Windows SDK
- CMake (if you modularize the C++ code)
Linux:
- No C++ compiler required (pure Python)
- System dependencies:
pulseaudio-utilsorpipewirewithlibpipewire-0.3-dev
macOS:
- Swift toolchain required for building the ScreenCaptureKit helper (v0.4.0+)
- Xcode Command Line Tools:
xcode-select --install - No C++ compiler required (pure Python backend)
- Helper binary location:
src/proctap/swift/screencapture-audio/
Contributions are welcome! We have structured issue templates to help guide your contributions:
- ๐ Bug Report - Report bugs or unexpected behavior
- โจ Feature Request - Suggest new features or enhancements
- โก Performance Issue - Report performance problems or optimizations
- ๐ง Type Hints / Async - Improve type annotations or async functionality
- ๐ Documentation - Improve docs, examples, or guides
Special Interest:
- PRs from WASAPI/C++ experts are especially appreciated
- Linux backend improvements (PulseAudio/PipeWire per-app isolation)
- macOS backend testing (ScreenCaptureKit on macOS 13+)
- Cross-platform testing and compatibility
- Performance profiling and optimization
MIT License
m96-chan
Windows Audio / VRChat Tools / Python / C++
https://github.com/m96-chan