Fix is_ascii performance regression on AVX-512 CPUs when compiling with -C target-cpu=native #151259

bonega · 2026-01-17T17:17:00Z

Summary

Fix [u8]::is_ascii() performance regression when compiled with -C target-cpu=native on AVX-512 CPUs.

Problem

When is_ascii is compiled with AVX-512 enabled, LLVM's auto-vectorization generates ~31 kshiftrd instructions to extract mask bits one-by-one, instead of using the efficient pmovmskb
instruction. This causes a ~22x performance regression.

Because is_ascii is marked #[inline], it gets inlined and recompiled with the user's target settings, affecting anyone using -C target-cpu=native on AVX-512 CPUs.

Solution

Replace the counting loop with explicit SSE2 intrinsics (_mm_movemask_epi8) that force pmovmskb codegen regardless of CPU features.

Godbolt Links (Rust 1.92)

Pattern	Target	Link	Result
Counting loop (old)	Default SSE2	https://godbolt.org/z/sE86xz4fY	`pmovmskb`
Counting loop (old)	AVX-512 (znver4)	https://godbolt.org/z/b3jvMhGd3	31x `kshiftrd` (broken)
SSE2 intrinsics (fix)	Default SSE2	https://godbolt.org/z/hMeGfeaPv	`pmovmskb`
SSE2 intrinsics (fix)	AVX-512 (znver4)	https://godbolt.org/z/Tdvdqjohn	`vpmovmskb` (fixed)

Benchmark Results

AMD Ryzen 5 7500F (Zen 4 with AVX-512):

Build	Before	After	Improvement
Default	~73 GB/s	~74 GB/s	No regression
`-C target-cpu=native`	~3 GB/s	~67 GB/s	22x

Note: this is the pure ascii path, but the story is similar for the others.
See linked bench project.

Test Plan

Assembly test (slice-is-ascii-avx512.rs) verifies no kshiftrd with AVX-512
Existing codegen test updated to loongarch64-only (auto-vectorization still used there)
Fuzz testing confirms old/new implementations produce identical results (~53M iterations)
Benchmarks confirm performance improvement
Tidy checks pass

Reproduction / Test Projects

Standalone validation tools: https://github.com/bonega/is-ascii-fix-validation

bench/ - Criterion benchmarks for SSE2 vs AVX-512 comparison
fuzz/ - Compares old/new implementations with libfuzzer

Related Issues

Regression introduced in Optimize is_ascii for str and [u8] further #130733
Similar AVX-512 kshiftrd codegen issue: Weird AVX 512 code generated with std::simd when using -Zbuild-std #129293

When `[u8]::is_ascii()` is compiled with `-C target-cpu=native` on AVX-512 CPUs, LLVM generates inefficient code. Because `is_ascii` is marked `#[inline]`, it gets inlined and recompiled with the user's target settings. The previous implementation used a counting loop that LLVM auto-vectorizes to `pmovmskb` on SSE2, but with AVX-512 enabled, LLVM uses k-registers and extracts bits individually with ~31 `kshiftrd` instructions. This fix replaces the counting loop with explicit SSE2 intrinsics (`_mm_loadu_si128`, `_mm_or_si128`, `_mm_movemask_epi8`) for x86_64. `_mm_movemask_epi8` compiles to `pmovmskb`, forcing efficient codegen regardless of CPU features. Benchmark results on AMD Ryzen 5 7500F (Zen 4 with AVX-512): - Default build: ~73 GB/s → ~74 GB/s (no regression) - With -C target-cpu=native: ~3 GB/s → ~67 GB/s (22x improvement) The loongarch64 implementation retains the original counting loop since it doesn't have this issue. Regression from: rust-lang#130733

rustbot · 2026-01-17T17:17:03Z

⚠️ #[rustc_allow_const_fn_unstable] needs careful audit to avoid accidentally exposing unstable
implementation details on stable.

cc @rust-lang/wg-const-eval

rustbot · 2026-01-17T17:17:05Z

r? @tgross35

rustbot has assigned @tgross35.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 17, 2026

rustbot assigned tgross35 Jan 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix is_ascii performance regression on AVX-512 CPUs when compiling with -C target-cpu=native #151259

Fix is_ascii performance regression on AVX-512 CPUs when compiling with -C target-cpu=native #151259

bonega commented Jan 17, 2026

Uh oh!

rustbot commented Jan 17, 2026

Uh oh!

rustbot commented Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Fix is_ascii performance regression on AVX-512 CPUs when compiling with -C target-cpu=native #151259

Are you sure you want to change the base?

Fix is_ascii performance regression on AVX-512 CPUs when compiling with -C target-cpu=native #151259

Conversation

bonega commented Jan 17, 2026

Summary

Problem

Solution

Godbolt Links (Rust 1.92)

Benchmark Results

Test Plan

Reproduction / Test Projects

Related Issues

Uh oh!

rustbot commented Jan 17, 2026

Uh oh!

rustbot commented Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants