Performance: parallel file I/O and optimized reading#44
Performance: parallel file I/O and optimized reading#44wojtek wants to merge 6 commits intozeux:masterfrom
Conversation
Use Windows API directly with FILE_FLAG_SEQUENTIAL_SCAN hint for better prefetching when reading files sequentially. This improves I/O throughput during index building. The POSIX implementation returns empty to allow fallback to standard file reading.
Replace manual loop with memchr() which is typically optimized with SIMD instructions for faster scanning.
Only one waiting producer needs to be woken when space becomes available, reducing unnecessary thread wake-ups.
Replace std::map with std::unordered_map for O(1) average lookup instead of O(log n) when caching compiled regex patterns.
- Pre-allocate file buffer using size hint to avoid O(n^2) reallocation - Use readFileOptimized() with FILE_FLAG_SEQUENTIAL_SCAN on Windows - Fallback to standard FileStream for non-Windows or special files
Use multiple reader threads to overlap file I/O with processing during index building. Reader threads read ahead while the main thread consumes files in order, improving throughput on systems with high I/O latency. The number of reader threads scales with available CPU cores, and a sliding window prevents readers from getting too far ahead.
|
COLD BUILD: INCREMENTAL: Tested on the following config: qgrep config for Unreal Engine 5.6.1path E:/UnrealEngine-5.6.1-release/Engine include .(ini)$ 186,283 files, 2GB input, 456MB index |
Summary
Benchmark Results (UE5.6.1 Engine - 186k files, 2GB)
Incremental updates only scan metadata, so no improvement expected there.
Changes
fileutil_win.cpp-readFileOptimized()with FILE_FLAG_SEQUENTIAL_SCANbuild.cpp- Parallel read-ahead with multiple reader threadsstringutil.hpp- Use memchr for faster line-end findingblockingqueue.hpp- notify_one instead of notify_allproject.cpp- unordered_map for regex cache