[Hexagon] Optimize argsort with vectorized Quicksort partition by max-krasnyansky · Pull Request #30 · max-krasnyansky/llama.cpp

max-krasnyansky · 2026-02-04T22:21:02Z

This PR optimizes the argsort operation in the Hexagon backend by vectorizing the Quicksort partition loop.

Changes:

ggml/src/ggml-hexagon/htp/hvx-base.h: Added hvx_vec_get_i32 helper function to extract a scalar integer from a vector, necessary for the reduction check.
ggml/src/ggml-hexagon/htp/argsort-ops.c:
- Replaced quicksort_indices_asc and quicksort_indices_desc with quicksort_values_indices_asc and quicksort_values_indices_desc.
- The new sorting functions sort the values scratchpad buffer directly and mirror the swaps to the indices buffer. This allows for contiguous vector loads from the values array, significantly speeding up the partition scan.
- Implemented the partition scanning loop using HVX intrinsics (Q6_Q_vcmp_gt_VsfVsf).
- Implemented a workaround for the missing Q6_Q_all_P instruction by using Q6_V_vmux_QVV to create a mask of 1s/0s and summing them with hvx_vec_reduce_sum_i32 to check if a whole block of 32 elements satisfies the pivot condition.
- Updated htp_argsort_f32 to use the new sorting functions.

Performance:
The vectorized scan reduces the number of scalar comparisons and branch mispredictions during the partitioning phase of Quicksort, which is the most compute-intensive part of the operation.

PR created automatically by Jules for task 9639307229427924630 started by @max-krasnyansky

Replaced the scalar Quicksort implementation with a vectorized version using HVX intrinsics. - Changed sorting strategy to direct sort on values buffer with mirrored index swaps for better vectorization. - Added `hvx_vec_get_i32` to `hvx-base.h`. - Implemented partition loop using vector comparisons and reduction-based "all check" (workaround for missing `Q6_Q_all_P`). Co-authored-by: max-krasnyansky <1380796+max-krasnyansky@users.noreply.github.com>

google-labs-jules · 2026-02-04T22:21:04Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

max-krasnyansky and others added 2 commits February 4, 2026 13:46

hexagon: add ARGSORT op

5aaf5de

github-actions bot added the ggml label Feb 4, 2026

max-krasnyansky force-pushed the master branch from 5aaf5de to bdcb213 Compare February 5, 2026 02:24

max-krasnyansky closed this Feb 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Hexagon] Optimize argsort with vectorized Quicksort partition#30

[Hexagon] Optimize argsort with vectorized Quicksort partition#30
max-krasnyansky wants to merge 2 commits intomasterfrom
hexagon-vector-argsort-9639307229427924630

max-krasnyansky commented Feb 4, 2026

Uh oh!

google-labs-jules bot commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

max-krasnyansky commented Feb 4, 2026

Uh oh!

google-labs-jules bot commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant