Skip to content

Comments

Performance: parallel file I/O and optimized reading#44

Open
wojtek wants to merge 6 commits intozeux:masterfrom
wojtek:performance-improvements
Open

Performance: parallel file I/O and optimized reading#44
wojtek wants to merge 6 commits intozeux:masterfrom
wojtek:performance-improvements

Conversation

@wojtek
Copy link

@wojtek wojtek commented Feb 3, 2026

Summary

  • Add parallel file I/O with read-ahead buffering for index building
  • Use Windows FILE_FLAG_SEQUENTIAL_SCAN for better prefetching
  • Pre-allocate file buffers to avoid O(n²) reallocation
  • Minor optimizations: memchr for line finding, unordered_map for regex cache

Benchmark Results (UE5.6.1 Engine - 186k files, 2GB)

Scenario Baseline Improved Speedup
Cold Build 4.47s 2.72s 1.64x (39% faster)
Incremental 1.15s 1.14s ~1.0x (no change)

Incremental updates only scan metadata, so no improvement expected there.

Changes

  1. fileutil_win.cpp - readFileOptimized() with FILE_FLAG_SEQUENTIAL_SCAN
  2. build.cpp - Parallel read-ahead with multiple reader threads
  3. stringutil.hpp - Use memchr for faster line-end finding
  4. blockingqueue.hpp - notify_one instead of notify_all
  5. project.cpp - unordered_map for regex cache

Use Windows API directly with FILE_FLAG_SEQUENTIAL_SCAN hint for better
prefetching when reading files sequentially. This improves I/O throughput
during index building.

The POSIX implementation returns empty to allow fallback to standard file
reading.
Replace manual loop with memchr() which is typically optimized with
SIMD instructions for faster scanning.
Only one waiting producer needs to be woken when space becomes available,
reducing unnecessary thread wake-ups.
Replace std::map with std::unordered_map for O(1) average lookup
instead of O(log n) when caching compiled regex patterns.
- Pre-allocate file buffer using size hint to avoid O(n^2) reallocation
- Use readFileOptimized() with FILE_FLAG_SEQUENTIAL_SCAN on Windows
- Fallback to standard FileStream for non-Windows or special files
Use multiple reader threads to overlap file I/O with processing during
index building. Reader threads read ahead while the main thread consumes
files in order, improving throughput on systems with high I/O latency.

The number of reader threads scales with available CPU cores, and a
sliding window prevents readers from getting too far ahead.
@wojtek
Copy link
Author

wojtek commented Feb 3, 2026

COLD BUILD:
BASELINE: 7.96s, 4.45s, 4.48s → avg 4.47s (excl. cold cache)
IMPROVED: 2.68s, 2.66s, 2.83s → avg 2.72s
SPEEDUP: 1.64x (39% faster)

INCREMENTAL:
BASELINE: 1.14s, 1.16s, 1.16s → avg 1.15s
IMPROVED: 1.14s, 1.15s, 1.13s → avg 1.14s
SPEEDUP: ~1.0x (no change)

Tested on the following config:

qgrep config for Unreal Engine 5.6.1

path E:/UnrealEngine-5.6.1-release/Engine

include .(ini)$
include .(cpp|c|h|hpp|cc|inl)$
include .(ispc|isph)$
include .(cs|vb)$
include .(cmake)$
include .(java|js|kt|kts|ts|tsx)$
include .(md|rst|txt)$
include .(pl|py|pm|rb)$
include .(rs)$
include .(usf|ush|hlsl|glsl|cg|fx|cgfx)$
include .(xml|yml|yaml)$
include .(uplugin|uproject)$
include .(sh|bat)$
exclude ^DerivedDataCache/
exclude ^Intermediate/

186,283 files, 2GB input, 456MB index

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant