Fix is_ascii performance regression on AVX-512 CPUs when compiling with -C target-cpu=native #151259
+98
−15
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Fix
[u8]::is_ascii()performance regression when compiled with-C target-cpu=nativeon AVX-512 CPUs.Problem
When
is_asciiis compiled with AVX-512 enabled, LLVM's auto-vectorization generates ~31kshiftrdinstructions to extract mask bits one-by-one, instead of using the efficientpmovmskbinstruction. This causes a ~22x performance regression.
Because
is_asciiis marked#[inline], it gets inlined and recompiled with the user's target settings, affecting anyone using-C target-cpu=nativeon AVX-512 CPUs.Solution
Replace the counting loop with explicit SSE2 intrinsics (
_mm_movemask_epi8) that forcepmovmskbcodegen regardless of CPU features.Godbolt Links (Rust 1.92)
pmovmskbkshiftrd(broken)pmovmskbvpmovmskb(fixed)Benchmark Results
AMD Ryzen 5 7500F (Zen 4 with AVX-512):
-C target-cpu=nativeNote: this is the pure ascii path, but the story is similar for the others.
See linked bench project.
Test Plan
slice-is-ascii-avx512.rs) verifies nokshiftrdwith AVX-512loongarch64-only (auto-vectorization still used there)Reproduction / Test Projects
Standalone validation tools: https://github.com/bonega/is-ascii-fix-validation
bench/- Criterion benchmarks for SSE2 vs AVX-512 comparisonfuzz/- Compares old/new implementations with libfuzzerRelated Issues
is_asciiforstrand[u8]further #130733kshiftrdcodegen issue: Weird AVX 512 code generated with std::simd when using -Zbuild-std #129293