Replies: 6 comments 8 replies
-
|
You can follow the examples here to add more kernels. The kernel shapes should include all quantized linear layer (i.e., attn_q/k/v/o, proj_down, proj_up, gate). These depend on intermediate size / hidden dimension of the model, and whether or not the model uses GQA (like llama-3). You can dump the model in PyTorch/llama.cpp to get these informations. |
Beta Was this translation helpful? Give feedback.
-
|
I am also need deploy TMAC+llama.cpp with qwen model in Android,if you know how to add qwen,Can you share these details ? @jason-zou |
Beta Was this translation helpful? Give feedback.
-
|
|
Beta Was this translation helpful? Give feedback.
-
|
I used llama.cpp to print the model details: how can I get the Kernel Arrays @kaleid-liner ? |
Beta Was this translation helpful? Give feedback.
-
|
I checked your commit history it seems you have done these gread jobs,I will make a test and if there are problems I will report. |
Beta Was this translation helpful? Give feedback.
-
|
I have added support for other models in gptq format. Check the latest news. And I suppose you are using qwen2 instead of qwen. Qwen is still unsupported by the convert script yet. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I saw it only support {llama-2-7b-4bit,llama-2-7b-2bit,llama-2-13b-2bit,llama-3-8b-2bit,llama-3-8b-4bit,hf-bitnet-3b,test}
can I apply it to other models like qwen
Beta Was this translation helpful? Give feedback.
All reactions