CANN: Support MUL_MAT_ID in ACL graph #19228

hipudding · 2026-01-31T08:20:06Z

Implement ggml_cann_mul_mat_id_quant function to support quantized matrix
multiplication for Mixture of Experts (MoE) architectures on CANN backend.

Key features:

Support Q4_0 and Q8_0 quantized weight formats
Use IndexSelect to dynamically route expert-specific weights based on indices
Leverage WeightQuantBatchMatmulV2 for efficient quantized computation
Handle automatic F16 type conversion for hardware compatibility
Support both per-expert and broadcast input modes

Implementation details:

Extract expert weights and scales using CANN IndexSelect operation
Process each batch and expert combination independently
Create proper tensor views with correct stride for matmul operations
Automatic input/output type casting to/from F16 as needed

Testing: All test cases passed for supported types (F32, F16, Q4_0, Q8_0).

Make sure to read the contributing guidelines before submitting a PR

ggml/src/ggml-cann/aclnn_ops.cpp

Implement ggml_cann_mul_mat_id_quant function to support quantized matrix multiplication for Mixture of Experts (MoE) architectures on CANN backend. Key features: - Support Q4_0 and Q8_0 quantized weight formats - Use IndexSelect to dynamically route expert-specific weights based on indices - Leverage WeightQuantBatchMatmulV2 for efficient quantized computation - Handle automatic F16 type conversion for hardware compatibility - Support both per-expert and broadcast input modes Implementation details: - Extract expert weights and scales using CANN IndexSelect operation - Process each batch and expert combination independently - Create proper tensor views with correct stride for matmul operations - Automatic input/output type casting to/from F16 as needed Testing: All test cases passed for supported types (F32, F16, Q4_0, Q8_0).

hipudding added the Ascend NPU issues specific to Ascend NPUs label Jan 31, 2026

hipudding self-assigned this Jan 31, 2026

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jan 31, 2026

loci-dev mentioned this pull request Jan 31, 2026

UPSTREAM PR #19228: CANN: Support MUL_MAT_ID in ACL graph auroralabs-loci/llama.cpp#1102

Open

hipudding marked this pull request as ready for review February 3, 2026 02:32

hipudding requested a review from noemotiovon February 3, 2026 06:07

hipudding force-pushed the mul_mat_id branch from d50cd44 to 30bc705 Compare February 3, 2026 08:01

hipudding commented Feb 4, 2026

View reviewed changes

hipudding force-pushed the mul_mat_id branch from 30bc705 to 5624c99 Compare February 4, 2026 08:36

hipudding requested a review from ggerganov February 4, 2026 09:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CANN: Support MUL_MAT_ID in ACL graph #19228

CANN: Support MUL_MAT_ID in ACL graph #19228

hipudding commented Jan 31, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CANN: Support MUL_MAT_ID in ACL graph #19228

Are you sure you want to change the base?

CANN: Support MUL_MAT_ID in ACL graph #19228

Conversation

hipudding commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hipudding commented Jan 31, 2026 •

edited

Loading