Skip to content

feat: implement quicker backend/posix walk algorithm#1889

Open
benmcclelland wants to merge 1 commit intomainfrom
ben/fast-walk
Open

feat: implement quicker backend/posix walk algorithm#1889
benmcclelland wants to merge 1 commit intomainfrom
ben/fast-walk

Conversation

@benmcclelland
Copy link
Member

Rewrite the posix Walk implementation to avoid the extra ReadDir per directory that was noted as a TODO in the old code. The new algorithm holds all traversal state in a walkState struct and uses processDir to interleave sibling entries in correct S3 lexicographic order without a second syscall.

Key changes:
prefix optimisation: jump directly into the deepest matching directory rather than scanning from the root on every call

marker short-circuit: skip entire subtrees that are lexically before the marker, making paginated listing faster

@benmcclelland
Copy link
Member Author

@tonyipm I added a benchmark test in here, and got the following results (but didnt run many times, so these may not be statistically significant):

Benchmark main (old) new Δ
WalkFlat 11.26 ms 12.12 ms +7.6% slower
WalkFlatDelim 11.10 ms 12.06 ms +8.6% slower
WalkFlatPaged 22.94 ms 26.99 ms +17.6% slower
WalkDeep 13.21 ms 12.34 ms -6.6% faster
WalkDeepDelim 87.2 µs 70.5 µs -19.1% faster
WalkWide 28.68 ms 27.71 ms -3.4% faster
WalkWideDelim 154.3 µs 127.1 µs -17.6% faster
WalkWithPrefix 220.2 µs 269.6 µs +22.4% slower
WalkWithPrefixDelim 38.37 µs 33.95 µs -11.5% faster
WalkWithMarker 1.384 ms 1.029 ms -25.6% faster
geomean 1.741 ms 1.676 ms -3.7%

Can you try running the tests, or adding to the benchmarks to see if this really is speeding up the cases in question. Also, we should consider if this is slowing down any cases that we might need to address.

@benmcclelland
Copy link
Member Author

tested again with some larger directories, these are relative to current main:

Benchmark main new Δ p-value
WalkFlatPaged 25.0 ms 23.4 ms -6.6% p=0.004 ✓
WalkDeepDelim 95.9 µs 77.6 µs -19.0% p=0.002 ✓
WalkWideDelim 171 µs 135 µs -21.2% p=0.002 ✓
WalkWithPrefix 256 µs 289 µs +12.7% p=0.004 ✓
WalkWithPrefixDelim 46.7 µs 39.0 µs -16.6% p=0.004 ✓
WalkWithMarker 1.64 ms 1.08 ms -34.4% p=0.041 ✓
WalkFlat 12.6 ms 11.9 ms ~ p=0.394 (noise)
WalkFlatDelim 12.0 ms 12.0 ms ~ p=0.699 (noise)
WalkDeep 14.1 ms 13.4 ms ~ p=0.485 (noise)
WalkWide 32.1 ms 31.4 ms ~ p=0.818 (noise)
geomean 1.95 ms 1.74 ms -10.8%  

Rewrite the posix Walk implementation to avoid the extra ReadDir per
directory that was noted as a TODO in the old code.  The new algorithm
holds all traversal state in a walkState struct and uses processDir to
interleave sibling entries in correct S3 lexicographic order without a
second syscall.

Key changes:
prefix optimisation: jump directly into the deepest matching directory
rather than scanning from the root on every call

marker short-circuit: skip entire subtrees that are lexically before
the marker, making paginated listing faster

Co-authored-by: Ben McClelland <ben.mcclelland@versity.com>
@tonyipm
Copy link

tonyipm commented Feb 24, 2026

Thanks Ben I will take a look - in my own performance testing I found it consistently faster than the old algo and where performance was similar I ran a profile and found nearly all the elapsed time was in os.ReadDir. I will see if I can run up your perf tests and profile them.

@benmcclelland
Copy link
Member Author

Thanks Ben I will take a look - in my own performance testing I found it consistently faster than the old algo and where performance was similar I ran a profile and found nearly all the elapsed time was in os.ReadDir. I will see if I can run up your perf tests and profile them.

I didn't do extensive testing, but just a few cases of "large" directories vs fan out into multiple directories. I should probably fire up some tests with really large directories since thats the real case we are trying to solve here. Under a million files seems too fast on my system for any useful analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants