[WIP] ggml-hexagon: convert f32 to f16 - fa opt part3 #19282

chraac · 2026-02-03T03:53:50Z

Key changes

Optimization: Implemented in-place F32-to-F16 conversion for row data, eliminating duplicate conversions inside the inner loop.
New Compute Primitives: Added fused multiply-add (MAD) support (hvx_mad_f32_f16_aa_rx2) for FP16 inputs with separate scaling factors.

Implementation Details

In-place F32 to F16 Conversion: The row conversion logic was hoisted out of the inner loop.
hvx_mad_f32_f16_aa_rx2: A /new HVX intrinsic wrapper allowing efficient accumulation of two FP16 vectors.

Performance

TODO: Add benchmarks comparisons (e.g., token/s improvements on target hardware).

…clarity

…iable handling

…dant checks for FP32

… handling for unused elements

chraac · 2026-02-03T03:58:31Z

ggml/src/ggml-hexagon/htp/flash-attn-ops.c

-        const uint8_t * q_ptr_vtcm = dma_queue_pop(dma).dst;
+        uint8_t * q_ptr_vtcm = dma_queue_pop(dma).dst;
+        if (is_q_fp32) {
+            hvx_copy_f16_f32_aa(q_ptr_vtcm, q_ptr_vtcm, DK);  // inplace convert f32 to f16


Pre-conversion to F16: Converted the row to F16 upfront to avoid repeated on-the-fly conversion in the code below.

chraac added 7 commits February 1, 2026 22:00

ggml-hexagon: enhance hvx_mad functions for improved performance and …

080db98

…clarity

wip

b022069

ggml-hexagon: optimize flash attention calculations with improved var…

fe1f3fb

…iable handling

ggml-hexagon: streamline flash attention operations by removing redun…

c2fe8a1

…dant checks for FP32

wip

367463c

ggml-hexagon: optimize hvx_dot_f16_f16_aa_rx2 by simplifying variable…

5458f41

… handling for unused elements

wip

7f477c4

chraac requested review from lhez and max-krasnyansky as code owners February 3, 2026 03:53

chraac marked this pull request as draft February 3, 2026 03:53

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Feb 3, 2026

chraac commented Feb 3, 2026

View reviewed changes

loci-dev mentioned this pull request Feb 3, 2026

UPSTREAM PR #19282: [WIP] ggml-hexagon: convert f32 to f16 - fa opt part3 auroralabs-loci/llama.cpp#1147

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] ggml-hexagon: convert f32 to f16 - fa opt part3 #19282

[WIP] ggml-hexagon: convert f32 to f16 - fa opt part3 #19282

chraac commented Feb 3, 2026

Uh oh!

chraac Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[WIP] ggml-hexagon: convert f32 to f16 - fa opt part3 #19282

Are you sure you want to change the base?

[WIP] ggml-hexagon: convert f32 to f16 - fa opt part3 #19282

Conversation

chraac commented Feb 3, 2026

Key changes

Implementation Details

Performance

Uh oh!

chraac Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant