libobs: Add SIMD-optimized audio pipeline with memory pooling#13201
Closed
marcusbooker77 wants to merge 1 commit intoobsproject:masterfrom
Closed
libobs: Add SIMD-optimized audio pipeline with memory pooling#13201marcusbooker77 wants to merge 1 commit intoobsproject:masterfrom
marcusbooker77 wants to merge 1 commit intoobsproject:masterfrom
Conversation
Implement vectorized audio mixing using SSE2/AVX intrinsics with runtime CPU feature detection. Add pre-allocated memory pool for audio buffers to eliminate malloc/free overhead on hot paths. Introduce multi-threaded audio processing pipeline with fixed-size thread pool and SPSC job queue for better multi-core utilization. Includes related changes to obs-source, obs-video, and core initialization. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Member
|
We do not accept AI-generated PRs. In the future, please take the time to read any project's published guidelines before submitting PRs. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces three major performance optimizations to the libobs audio and video pipelines:
1. SIMD-Vectorized Audio Mixing (
obs-audio-optimized.c)_mm_add_ps— available on all x86-64 CPUs_mm256_add_ps— enabled via runtimeCPUIDdetection, no recompile needed_mm_prefetchhints on both source and destination buffers reduce L1 cache misses*(mix++) += *(aud++)hot loop inmix_audio()that runs every audio tick (~21ms at 48kHz)2. Video Frame Copy with Non-Temporal Stores (
copy_video_plane_optimized)_mm_stream_si128non-temporal stores to bypass CPU cache, avoiding cache pollution from write-once frame data_mm_storeu_si128vectorized copies for aligned-but-small planes, and plainmemcpyfor unaligned destinationswidth == dst_stride == src_strideavoids per-line overhead3. Pre-Allocated Audio Buffer Memory Pool (
obs-audio-pool.c/h)malloc/freeoverhead in the audio pipeline hot pathpthread_mutexwith minimal contention (lock held only during pointer swap)4. Multi-Threaded Audio Processing (
obs-audio-threaded.c/h)logical_cores - 1, capped at 16)pthread_condsignalingLOG_WARNING5. Lock-Free SPSC Queue (
util/spsc-queue.h)memory_order_acquire/release<stdatomic.h>/ MSVC intrinsicsSupporting Changes
obs-audio.c: Callsmix_audio_optimized()andzero_audio_buffer_optimized()instead of scalar loopsobs-video.c: Callscopy_video_plane_optimized()forset_gpu_converted_plane()andcopy_rgbx_frame(); adds MMCSSAvSetMmThreadCharacteristics("Pro Audio")on Windows for the graphics threadobs-video-gpu-encode.c: Adds MMCSS thread priority boost for the GPU encode threadobs-source.c: Integrates pool-allocated audio buffers for per-source audio dataobs.c: Initializes and tears down the audio pool and thread pool duringobs_startup/obs_shutdownobs-internal.h: Adds pool and thread pool pointers toobs_core_audioExpected Performance Impact
Based on analysis and architectural properties (formal benchmarks in progress):
Target: ≥15% overall CPU reduction in typical streaming scenarios (1080p30, 4+ sources).
Compatibility
CPUIDconfirms support; SSE2 path is the baseline (guaranteed on all x86-64)os_atomic_*wrappers and pthreads; compiles on Windows/Linux/macOSTest Plan
LOG_INFOmessage)obs-perf-monitor.ps1shows CPU reduction vs baseline buildFiles Changed
libobs/obs-audio-optimized.clibobs/obs-audio-pool.clibobs/obs-audio-pool.hlibobs/obs-audio-threaded.clibobs/obs-audio-threaded.hlibobs/util/spsc-queue.hlibobs/CMakeLists.txtlibobs/obs-audio.clibobs/obs-video.clibobs/obs-video-gpu-encode.clibobs/obs-source.clibobs/obs.clibobs/obs-internal.h🤖 Generated with Claude Code