CANN: added q4_1 and q8_1 quantization support for CANN backend#21
Open
thefish111 wants to merge 1 commit intonoemotiovon:masterfrom
Open
CANN: added q4_1 and q8_1 quantization support for CANN backend#21thefish111 wants to merge 1 commit intonoemotiovon:masterfrom
thefish111 wants to merge 1 commit intonoemotiovon:masterfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[CANN] 添加对于q4_1,q8_1量化类型的支持
概述 (Summary)
添加对于q4_1,q8_1量化类型的支持
问题描述
当前项目并未支持q4_1,q8_1量化类型到CANN后端的前向逻辑
背景 (Motivations)
目标 (Goals)
详细设计 (Detailed Design)
通过MUL_MAT testcases主要需要添加block q4_1, block q8_1数据类型,及其到cann接口支持类型的转换和反转换逻辑,并在调用时添加q4_1, q8_1的入口。
q4_1,q8_1 block格式定义内容:
ggml_cann_compute_forward->ggml_cann_mul_mat->ggml_cann_mul_mat_quant
修改sgml_cann_mul_mat_quant添加q4_1, q8_1入口,并添加对于两个新类型中多出的一个半精度位的读取与传入逻辑。
随后为了适应CANN后端提供的WeightQuantBatchMatmulV2接口,完成对应的新类型的数据转换逻辑,主要分为格式和数值两个方面的操作,最终统一转化为acl支持类型的tensor传入。
针对格式部分,将blockwise存储的量化格式权重按元素类别分别重排并创建tensor
针对数值部分,对q4_1进行反量化时添加对于反转换的逻辑支持,主要包括向qs向有符号数转换,然后对offset添加8*d的补偿。对q8_1只修改block内部的读取逻辑,数值转换上不做处理。
细节见ggml_cann.cpp文件中的transform/transform_back相关函数实现
测试结果:
q4_1,q8_1作为type a在原始项目支持的no permutation cases中均通过
备注:
test_backend_ops文件在指定 -b CANN时仍然测试备用后端cpu,此处测试时添加了cpu上的q8_1_vec_dot前向逻辑的placeholder用于测试,但是提交pr时并没有保留对于test_backend文件这部分的修改。如需测试q8_1可暂时禁用添加cpu backend。