CANN: Support more functions in FA operator by glitter4 · Pull Request #20 · noemotiovon/llama.cpp

glitter4 · 2025-12-04T07:03:13Z

描述 (Description)

本次 PR 针对 CANN 后端的 flash_attn_ext 算子进行了功能增强和限制放宽，主要包括：

去除 KV 头尺寸限制：移除了 K 和 V 张量头维度 (ne[0]) 必须一致的限制，现在支持 Q、K、V 具有不同的头维度（会自动 Padding 到最大维度）。
实现 logitSoftcap：在算子中实现了 logitSoftcap 逻辑，支持对 Attention Score 进行 Softcap 处理。
去除 Head Size 必须为 16 倍数的限制：通过内部实现的 Padding 机制 (pad_to_max_dim)，去除了 Head Size 必须是 16 的倍数的硬性限制.

测试 (Testing)

logitSoftcap 测试：使用带有 logitSoftcap 参数的模型进行推理，验证结果是否正确。
尺寸兼容性测试：
- 测试了 K 和 V 头维度不一致的情况。
- 测试了 Head Size 非 16 倍数（如 192, 576）的情况，验证 Padding 逻辑及计算结果的正确性。
  e.g.
  FLASH_ATTN_EXT(hsk=40,hsv=40,nh=4,nr23=[4,3],kv=512,nb=35,mask=0,sinks=0,max_bias=0.000000,logit_softcap=0.000000,prec=f32,type_KV=f16,permute=[0,1,2,3]): �[1;32mOK�[0m
  FLASH_ATTN_EXT(hsk=40,hsv=40,nh=4,nr23=[4,3],kv=512,nb=35,mask=0,sinks=0,max_bias=0.000000,logit_softcap=0.000000,prec=f32,type_KV=bf16,permute=[0,1,2,3]): �[1;32mOK�[0m
  FLASH_ATTN_EXT(hsk=40,hsv=40,nh=4,nr23=[4,1],kv=512,nb=35,mask=1,sinks=0,max_bias=0.000000,logit_softcap=0.000000,prec=f32,type_KV=f16,permute=[0,1,2,3]): �[1;32mOK�[0m
  FLASH_ATTN_EXT(hsk=40,hsv=40,nh=4,nr23=[4,1],kv=512,nb=35,mask=1,sinks=0,max_bias=0.000000,logit_softcap=0.000000,prec=f32,type_KV=f16,permute=[0,2,1,3]): �[1;32mOK�[0m
  FLASH_ATTN_EXT(hsk=40,hsv=40,nh=4,nr23=[4,1],kv=512,nb=35,mask=1,sinks=0,max_bias=0.000000,logit_softcap=0.000000,prec=f32,type_KV=bf16,permute=[0,1,2,3]): �[1;32mOK�[0m
  FLASH_ATTN_EXT(hsk=40,hsv=40,nh=4,nr23=[4,1],kv=512,nb=35,mask=1,sinks=0,max_bias=0.000000,logit_softcap=0.000000,prec=f32,type_KV=bf16,permute=[0,2,1,3]): �[1;32mOK�[0m

备注 (Notes)

无

本次 PR 针对 CANN 后端的 `flash_attn_ext` 算子进行了功能增强和限制放宽，主要包括： 1. **去除 KV 头尺寸限制**：移除了 K 和 V 张量头维度 (`ne[0]`) 必须一致的限制，现在支持 Q、K、V 具有不同的头维度（会自动 Padding 到最大维度）。 2. **实现 logitSoftcap**：在算子中实现了 `logitSoftcap` 逻辑，支持对 Attention Score 进行 Softcap 处理。 3. **去除 Head Size 必须为 16 倍数的限制**：通过内部实现的 Padding 机制 (`pad_to_max_dim`)，去除了 Head Size 必须是 16 的倍数的硬性限制，从而支持了如 DeepSeek MLA (Head Size 576) 等模型。 # 测试 (Testing) 1. **logitSoftcap 测试**：使用带有 `logitSoftcap` 参数的模型进行推理，验证结果是否正确。 2. **尺寸兼容性测试**： - 测试了 K 和 V 头维度不一致的情况。 - 测试了 Head Size 非 16 倍数（如 192, 576）的情况，验证 Padding 逻辑及计算结果的正确性。 # 备注 (Notes) 无

noemotiovon · 2025-12-05T08:43:54Z

超级棒的贡献！牛逼！可以麻烦贡献到上游社区嘛？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CANN: Support more functions in FA operator#20

CANN: Support more functions in FA operator#20
glitter4 wants to merge 1 commit intonoemotiovon:masterfrom
glitter4:master

glitter4 commented Dec 4, 2025

Uh oh!

noemotiovon commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

glitter4 commented Dec 4, 2025

描述 (Description)

测试 (Testing)

备注 (Notes)

Uh oh!

noemotiovon commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants