How can I apply T-MAC to other models？ #15

jason-zou · 2024-08-15T06:23:29Z

jason-zou
Aug 15, 2024

I saw it only support {llama-2-7b-4bit,llama-2-7b-2bit,llama-2-13b-2bit,llama-3-8b-2bit,llama-3-8b-4bit,hf-bitnet-3b,test}
can I apply it to other models like qwen

kaleid-liner · 2024-08-15T06:38:50Z

kaleid-liner
Aug 15, 2024

You can follow the examples here to add more kernels. The kernel shapes should include all quantized linear layer (i.e., attn_q/k/v/o, proj_down, proj_up, gate). These depend on intermediate size / hidden dimension of the model, and whether or not the model uses GQA (like llama-3). You can dump the model in PyTorch/llama.cpp to get these informations.

0 replies

FranzKafkaYu · 2024-08-15T08:40:27Z

FranzKafkaYu
Aug 15, 2024

I am also need deploy TMAC+llama.cpp with qwen model in Android,if you know how to add qwen,Can you share these details ? @jason-zou

0 replies

FranzKafkaYu · 2024-08-15T08:52:38Z

FranzKafkaYu
Aug 15, 2024

You can dump the model in PyTorch/llama.cpp to get these informations
@kaleid-liner Hello sir,can you give more detailed steps for adding Qwen kernel,actually I don't know what is attn_q/k/v/o, proj_down, proj_up, gate...

0 replies

FranzKafkaYu · 2024-08-16T02:11:26Z

FranzKafkaYu
Aug 16, 2024

I used llama.cpp to print the model details:

08-15 16:00:29.810 10044 10107 I LLama-android: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
08-15 16:00:29.810 10044 10107 I LLama-android: llama_model_loader: - kv   0:                       general.architecture str              = qwen2
08-15 16:00:29.810 10044 10107 I LLama-android: llama_model_loader: - kv   1:                               general.type str              = model
08-15 16:00:29.810 10044 10107 I LLama-android: llama_model_loader: - kv   2:                               general.name str              = AI_Model
08-15 16:00:29.810 10044 10107 I LLama-android: llama_model_loader: - kv   3:                         general.size_label str              = 494M
08-15 16:00:29.810 10044 10107 I LLama-android: llama_model_loader: - kv   4:                          qwen2.block_count u32              = 24
08-15 16:00:29.810 10044 10107 I LLama-android: llama_model_loader: - kv   5:                       qwen2.context_length u32              = 32768
08-15 16:00:29.810 10044 10107 I LLama-android: llama_model_loader: - kv   6:                     qwen2.embedding_length u32              = 896
08-15 16:00:29.810 10044 10107 I LLama-android: llama_model_loader: - kv   7:                  qwen2.feed_forward_length u32              = 4864
08-15 16:00:29.810 10044 10107 I LLama-android: llama_model_loader: - kv   8:                 qwen2.attention.head_count u32              = 14
08-15 16:00:29.810 10044 10107 I LLama-android: llama_model_loader: - kv   9:              qwen2.attention.head_count_kv u32              = 2
08-15 16:00:29.810 10044 10107 I LLama-android: llama_model_loader: - kv  10:                       qwen2.rope.freq_base f32              = 1000000.000000
08-15 16:00:29.810 10044 10107 I LLama-android: llama_model_loader: - kv  11:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
08-15 16:00:29.810 10044 10107 I LLama-android: llama_model_loader: - kv  12:                          general.file_type u32              = 2
08-15 16:00:29.810 10044 10107 I LLama-android: llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2
08-15 16:00:29.810 10044 10107 I LLama-android: llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = qwen2
08-15 16:00:29.868 10044 10107 I LLama-android: llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "", "&", "'", ...
08-15 16:00:29.889 10044 10107 I LLama-android: llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
08-15 16:00:29.949 10044 10107 I LLama-android: llama_model_loader: - kv  17:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
08-15 16:00:29.949 10044 10107 I LLama-android: llama_model_loader: - kv  18:                tokenizer.ggml.eos_token_id u32              = 151645
08-15 16:00:29.949 10044 10107 I LLama-android: llama_model_loader: - kv  19:            tokenizer.ggml.padding_token_id u32              = 151643
08-15 16:00:29.949 10044 10107 I LLama-android: llama_model_loader: - kv  20:                tokenizer.ggml.bos_token_id u32              = 151643
08-15 16:00:29.949 10044 10107 I LLama-android: llama_model_loader: - kv  21:                    tokenizer.chat_template str              = {-107732238428550025633549537852171948407976130944385741446622902831951351080628521997716918865536884607535372703052150861230582896697462443075202517321702951537854339417602815342824911808967527308411848461112923592282659498077075523239936.000000or message in messages }{ 0f lo...
08-15 16:00:29.949 10044 10107 I LLama-android: llama_model_loader: - kv  22:               general.quantization_version u32              = 2
08-15 16:00:29.949 10044 10107 I LLama-android: llama_model_loader: - type  f32:  121 tensors
08-15 16:00:29.949 10044 10107 I LLama-android: llama_model_loader: - type q4_0:  168 tensors
08-15 16:00:29.949 10044 10107 I LLama-android: llama_model_loader: - type q8_0:    1 tensors
08-15 16:00:30.332 10044 10107 I LLama-android: llm_load_vocab: special tokens cache size = 3
08-15 16:00:30.386 10044 10107 I LLama-android: llm_load_vocab: token to piece cache size = 0.9308 MB
08-15 16:00:30.386 10044 10107 I LLama-android: llm_load_print_meta: format           = GGUF V3 (latest)
08-15 16:00:30.386 10044 10107 I LLama-android: llm_load_print_meta: arch             = qwen2
08-15 16:00:30.386 10044 10107 I LLama-android: llm_load_print_meta: vocab type       = BPE
08-15 16:00:30.386 10044 10107 I LLama-android: llm_load_print_meta: n_vocab          = 151936
08-15 16:00:30.386 10044 10107 I LLama-android: llm_load_print_meta: n_merges         = 151387
08-15 16:00:30.386 10044 10107 I LLama-android: llm_load_print_meta: vocab_only       = 0
08-15 16:00:30.386 10044 10107 I LLama-android: llm_load_print_meta: n_ctx_train      = 32768
08-15 16:00:30.386 10044 10107 I LLama-android: llm_load_print_meta: n_embd           = 896
08-15 16:00:30.386 10044 10107 I LLama-android: llm_load_print_meta: n_layer          = 24
08-15 16:00:30.386 10044 10107 I LLama-android: llm_load_print_meta: n_head           = 14
08-15 16:00:30.386 10044 10107 I LLama-android: llm_load_print_meta: n_head_kv        = 2
08-15 16:00:30.386 10044 10107 I LLama-android: llm_load_print_meta: n_rot            = 64
08-15 16:00:30.386 10044 10107 I LLama-android: llm_load_print_meta: n_swa            = 0
08-15 16:00:30.386 10044 10107 I LLama-android: llm_load_print_meta: n_embd_head_k    = 64
08-15 16:00:30.386 10044 10107 I LLama-android: llm_load_print_meta: n_embd_head_v    = 64
08-15 16:00:30.386 10044 10107 I LLama-android: llm_load_print_meta: n_gqa            = 7
08-15 16:00:30.386 10044 10107 I LLama-android: llm_load_print_meta: n_embd_k_gqa     = 128
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: n_embd_v_gqa     = 128
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: f_norm_eps       = 0.0e+00
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: f_logit_scale    = 0.0e+00
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: n_ff             = 4864
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: n_expert         = 0
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: n_expert_used    = 0
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: causal attn      = 1
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: pooling type     = 0
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: rope type        = 2
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: rope scaling     = linear
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: freq_base_train  = 1000000.0
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: freq_scale_train = 1
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: n_ctx_orig_yarn  = 32768
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: rope_finetuned   = unknown
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: ssm_d_conv       = 0
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: ssm_d_inner      = 0
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: ssm_d_state      = 0
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: ssm_dt_rank      = 0
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: model type       = 1B
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: model ftype      = Q4_0
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: model params     = 494.03 M
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: model size       = 330.17 MiB (5.61 BPW)
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: general.name     = AI_Model
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: BOS token        = 151643 '<|endoftext|>'
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: EOS token        = 151645 '<|im_end|>'
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: PAD token        = 151643 '<|endoftext|>'
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: LF token         = 148848 'ÄĬ'
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: EOT token        = 151645 '<|im_end|>'
08-15 16:00:30.387 10044 10107 I LLama-android: llm_load_print_meta: max token length = 256
08-15 16:00:48.920 10044 10107 I LLama-android: llm_load_tensors: ggml ctx size =    0.13 MiB
08-15 16:00:49.466 10044 10107 I LLama-android: llm_load_tensors: offloading 0 repeating layers to GPU
08-15 16:00:49.466 10044 10107 I LLama-android: llm_load_tensors: offloaded 0/25 layers to GPU
08-15 16:00:49.466 10044 10107 I LLama-android: llm_load_tensors:        CPU buffer size =   330.17 MiB
08-15 16:00:49.500 10044 10107 I LLama-android: llama_new_context_with_model: n_ctx      = 2048
08-15 16:00:49.500 10044 10107 I LLama-android: llama_new_context_with_model: n_batch    = 2048
08-15 16:00:49.500 10044 10107 I LLama-android: llama_new_context_with_model: n_ubatch   = 512
08-15 16:00:49.500 10044 10107 I LLama-android: llama_new_context_with_model: flash_attn = 0
08-15 16:00:49.500 10044 10107 I LLama-android: llama_new_context_with_model: freq_base  = 1000000.0
08-15 16:00:49.500 10044 10107 I LLama-android: llama_new_context_with_model: freq_scale = 1
08-15 16:00:49.512 10044 10107 I LLama-android: llama_kv_cache_init: Vulkan_Host KV buffer size =    24.00 MiB
08-15 16:00:49.512 10044 10107 I LLama-android: llama_new_context_with_model: KV self size  =   24.00 MiB, K (f16):   12.00 MiB, V (f16):   12.00 MiB
08-15 16:00:49.513 10044 10107 I LLama-android: llama_new_context_with_model: Vulkan_Host  output buffer size =     0.58 MiB
08-15 16:00:49.655 10044 10107 I LLama-android: llama_new_context_with_model: Mali-G68 MC4 compute buffer size =   436.44 MiB
08-15 16:00:49.655 10044 10107 I LLama-android: llama_new_context_with_model: Vulkan_Host compute buffer size =     5.76 MiB
08-15 16:00:49.655 10044 10107 I LLama-android: llama_new_context_with_model: graph nodes  = 846
08-15 16:00:49.655 10044 10107 I LLama-android: llama_new_context_with_model: graph splits = 340

how can I get the Kernel Arrays @kaleid-liner ?

5 replies

kaleid-liner Aug 16, 2024

You can use gguf-dump. It will dump all of the tensors of the model.

FranzKafkaYu Aug 16, 2024

here is the details:

2024:08:16-15:54:41:371      1:  136134656 |   896, 151936,     1,     1 | Q8_0    | token_embd.weight
2024:08:16-15:54:41:371      2:        896 |   896,     1,     1,     1 | F32     | blk.0.attn_norm.weight
2024:08:16-15:54:41:371      3:    4358144 |  4864,   896,     1,     1 | Q6_K    | blk.0.ffn_down.weight
2024:08:16-15:54:41:371      4:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.0.ffn_gate.weight
2024:08:16-15:54:41:371      5:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.0.ffn_up.weight
2024:08:16-15:54:41:371      6:        896 |   896,     1,     1,     1 | F32     | blk.0.ffn_norm.weight
2024:08:16-15:54:41:371      7:        128 |   128,     1,     1,     1 | F32     | blk.0.attn_k.bias
2024:08:16-15:54:41:371      8:     114688 |   896,   128,     1,     1 | Q5_0    | blk.0.attn_k.weight
2024:08:16-15:54:41:371      9:     802816 |   896,   896,     1,     1 | Q5_0    | blk.0.attn_output.weight
2024:08:16-15:54:41:371     10:        896 |   896,     1,     1,     1 | F32     | blk.0.attn_q.bias
2024:08:16-15:54:41:372     11:     802816 |   896,   896,     1,     1 | Q5_0    | blk.0.attn_q.weight
2024:08:16-15:54:41:372     12:        128 |   128,     1,     1,     1 | F32     | blk.0.attn_v.bias
2024:08:16-15:54:41:372     13:     114688 |   896,   128,     1,     1 | Q8_0    | blk.0.attn_v.weight
2024:08:16-15:54:41:372     14:        896 |   896,     1,     1,     1 | F32     | blk.1.attn_norm.weight
2024:08:16-15:54:41:372     15:    4358144 |  4864,   896,     1,     1 | Q6_K    | blk.1.ffn_down.weight
2024:08:16-15:54:41:372     16:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.1.ffn_gate.weight
2024:08:16-15:54:41:372     17:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.1.ffn_up.weight
2024:08:16-15:54:41:372     18:        896 |   896,     1,     1,     1 | F32     | blk.1.ffn_norm.weight
2024:08:16-15:54:41:372     19:        128 |   128,     1,     1,     1 | F32     | blk.1.attn_k.bias
2024:08:16-15:54:41:372     20:     114688 |   896,   128,     1,     1 | Q5_0    | blk.1.attn_k.weight
2024:08:16-15:54:41:372     21:     802816 |   896,   896,     1,     1 | Q5_0    | blk.1.attn_output.weight
2024:08:16-15:54:41:372     22:        896 |   896,     1,     1,     1 | F32     | blk.1.attn_q.bias
2024:08:16-15:54:41:372     23:     802816 |   896,   896,     1,     1 | Q5_0    | blk.1.attn_q.weight
2024:08:16-15:54:41:372     24:        128 |   128,     1,     1,     1 | F32     | blk.1.attn_v.bias
2024:08:16-15:54:41:372     25:     114688 |   896,   128,     1,     1 | Q8_0    | blk.1.attn_v.weight
2024:08:16-15:54:41:372     26:        896 |   896,     1,     1,     1 | F32     | blk.10.attn_norm.weight
2024:08:16-15:54:41:372     27:    4358144 |  4864,   896,     1,     1 | Q6_K    | blk.10.ffn_down.weight
2024:08:16-15:54:41:372     28:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.10.ffn_gate.weight
2024:08:16-15:54:41:372     29:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.10.ffn_up.weight
2024:08:16-15:54:41:372     30:        896 |   896,     1,     1,     1 | F32     | blk.10.ffn_norm.weight
2024:08:16-15:54:41:372     31:        128 |   128,     1,     1,     1 | F32     | blk.10.attn_k.bias
2024:08:16-15:54:41:372     32:     114688 |   896,   128,     1,     1 | Q5_0    | blk.10.attn_k.weight
2024:08:16-15:54:41:373     33:     802816 |   896,   896,     1,     1 | Q5_0    | blk.10.attn_output.weight
2024:08:16-15:54:41:373     34:        896 |   896,     1,     1,     1 | F32     | blk.10.attn_q.bias
2024:08:16-15:54:41:373     35:     802816 |   896,   896,     1,     1 | Q5_0    | blk.10.attn_q.weight
2024:08:16-15:54:41:373     36:        128 |   128,     1,     1,     1 | F32     | blk.10.attn_v.bias
2024:08:16-15:54:41:373     37:     114688 |   896,   128,     1,     1 | Q8_0    | blk.10.attn_v.weight
2024:08:16-15:54:41:373     38:        896 |   896,     1,     1,     1 | F32     | blk.11.attn_norm.weight
2024:08:16-15:54:41:373     39:    4358144 |  4864,   896,     1,     1 | Q4_K    | blk.11.ffn_down.weight
2024:08:16-15:54:41:373     40:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.11.ffn_gate.weight
2024:08:16-15:54:41:373     41:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.11.ffn_up.weight
2024:08:16-15:54:41:373     42:        896 |   896,     1,     1,     1 | F32     | blk.11.ffn_norm.weight
2024:08:16-15:54:41:373     43:        128 |   128,     1,     1,     1 | F32     | blk.11.attn_k.bias
2024:08:16-15:54:41:373     44:     114688 |   896,   128,     1,     1 | Q5_0    | blk.11.attn_k.weight
2024:08:16-15:54:41:373     45:     802816 |   896,   896,     1,     1 | Q5_0    | blk.11.attn_output.weight
2024:08:16-15:54:41:373     46:        896 |   896,     1,     1,     1 | F32     | blk.11.attn_q.bias
2024:08:16-15:54:41:373     47:     802816 |   896,   896,     1,     1 | Q5_0    | blk.11.attn_q.weight
2024:08:16-15:54:41:373     48:        128 |   128,     1,     1,     1 | F32     | blk.11.attn_v.bias
2024:08:16-15:54:41:373     49:     114688 |   896,   128,     1,     1 | Q5_0    | blk.11.attn_v.weight
2024:08:16-15:54:41:373     50:        896 |   896,     1,     1,     1 | F32     | blk.12.attn_norm.weight
2024:08:16-15:54:41:373     51:    4358144 |  4864,   896,     1,     1 | Q4_K    | blk.12.ffn_down.weight
2024:08:16-15:54:41:373     52:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.12.ffn_gate.weight
2024:08:16-15:54:41:373     53:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.12.ffn_up.weight
2024:08:16-15:54:41:373     54:        896 |   896,     1,     1,     1 | F32     | blk.12.ffn_norm.weight
2024:08:16-15:54:41:373     55:        128 |   128,     1,     1,     1 | F32     | blk.12.attn_k.bias
2024:08:16-15:54:41:373     56:     114688 |   896,   128,     1,     1 | Q5_0    | blk.12.attn_k.weight
2024:08:16-15:54:41:373     57:     802816 |   896,   896,     1,     1 | Q5_0    | blk.12.attn_output.weight
2024:08:16-15:54:41:373     58:        896 |   896,     1,     1,     1 | F32     | blk.12.attn_q.bias
2024:08:16-15:54:41:374     59:     802816 |   896,   896,     1,     1 | Q5_0    | blk.12.attn_q.weight
2024:08:16-15:54:41:374     60:        128 |   128,     1,     1,     1 | F32     | blk.12.attn_v.bias
2024:08:16-15:54:41:374     61:     114688 |   896,   128,     1,     1 | Q5_0    | blk.12.attn_v.weight
2024:08:16-15:54:41:374     62:        896 |   896,     1,     1,     1 | F32     | blk.13.attn_norm.weight
2024:08:16-15:54:41:374     63:    4358144 |  4864,   896,     1,     1 | Q6_K    | blk.13.ffn_down.weight
2024:08:16-15:54:41:374     64:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.13.ffn_gate.weight
2024:08:16-15:54:41:374     65:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.13.ffn_up.weight
2024:08:16-15:54:41:375     66:        896 |   896,     1,     1,     1 | F32     | blk.13.ffn_norm.weight
2024:08:16-15:54:41:375     67:        128 |   128,     1,     1,     1 | F32     | blk.13.attn_k.bias
2024:08:16-15:54:41:375     68:     114688 |   896,   128,     1,     1 | Q5_0    | blk.13.attn_k.weight
2024:08:16-15:54:41:375     69:     802816 |   896,   896,     1,     1 | Q5_0    | blk.13.attn_output.weight
2024:08:16-15:54:41:375     70:        896 |   896,     1,     1,     1 | F32     | blk.13.attn_q.bias
2024:08:16-15:54:41:375     71:     802816 |   896,   896,     1,     1 | Q5_0    | blk.13.attn_q.weight
2024:08:16-15:54:41:375     72:        128 |   128,     1,     1,     1 | F32     | blk.13.attn_v.bias
2024:08:16-15:54:41:375     73:     114688 |   896,   128,     1,     1 | Q8_0    | blk.13.attn_v.weight
2024:08:16-15:54:41:375     74:        896 |   896,     1,     1,     1 | F32     | blk.14.attn_norm.weight
2024:08:16-15:54:41:375     75:    4358144 |  4864,   896,     1,     1 | Q4_K    | blk.14.ffn_down.weight
2024:08:16-15:54:41:375     76:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.14.ffn_gate.weight
2024:08:16-15:54:41:375     77:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.14.ffn_up.weight
2024:08:16-15:54:41:375     78:        896 |   896,     1,     1,     1 | F32     | blk.14.ffn_norm.weight
2024:08:16-15:54:41:375     79:        128 |   128,     1,     1,     1 | F32     | blk.14.attn_k.bias
2024:08:16-15:54:41:375     80:     114688 |   896,   128,     1,     1 | Q5_0    | blk.14.attn_k.weight
2024:08:16-15:54:41:375     81:     802816 |   896,   896,     1,     1 | Q5_0    | blk.14.attn_output.weight
2024:08:16-15:54:41:375     82:        896 |   896,     1,     1,     1 | F32     | blk.14.attn_q.bias
2024:08:16-15:54:41:375     83:     802816 |   896,   896,     1,     1 | Q5_0    | blk.14.attn_q.weight
2024:08:16-15:54:41:375     84:        128 |   128,     1,     1,     1 | F32     | blk.14.attn_v.bias
2024:08:16-15:54:41:375     85:     114688 |   896,   128,     1,     1 | Q5_0    | blk.14.attn_v.weight
2024:08:16-15:54:41:375     86:        896 |   896,     1,     1,     1 | F32     | blk.15.attn_norm.weight
2024:08:16-15:54:41:375     87:    4358144 |  4864,   896,     1,     1 | Q4_K    | blk.15.ffn_down.weight
2024:08:16-15:54:41:375     88:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.15.ffn_gate.weight
2024:08:16-15:54:41:376     89:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.15.ffn_up.weight
2024:08:16-15:54:41:376     90:        896 |   896,     1,     1,     1 | F32     | blk.15.ffn_norm.weight
2024:08:16-15:54:41:376     91:        128 |   128,     1,     1,     1 | F32     | blk.15.attn_k.bias
2024:08:16-15:54:41:376     92:     114688 |   896,   128,     1,     1 | Q5_0    | blk.15.attn_k.weight
2024:08:16-15:54:41:376     93:     802816 |   896,   896,     1,     1 | Q5_0    | blk.15.attn_output.weight
2024:08:16-15:54:41:376     94:        896 |   896,     1,     1,     1 | F32     | blk.15.attn_q.bias
2024:08:16-15:54:41:376     95:     802816 |   896,   896,     1,     1 | Q5_0    | blk.15.attn_q.weight
2024:08:16-15:54:41:376     96:        128 |   128,     1,     1,     1 | F32     | blk.15.attn_v.bias
2024:08:16-15:54:41:376     97:     114688 |   896,   128,     1,     1 | Q5_0    | blk.15.attn_v.weight
2024:08:16-15:54:41:376     98:        896 |   896,     1,     1,     1 | F32     | blk.16.attn_norm.weight
2024:08:16-15:54:41:376     99:    4358144 |  4864,   896,     1,     1 | Q6_K    | blk.16.ffn_down.weight
2024:08:16-15:54:41:376    100:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.16.ffn_gate.weight
2024:08:16-15:54:41:376    101:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.16.ffn_up.weight
2024:08:16-15:54:41:376    102:        896 |   896,     1,     1,     1 | F32     | blk.16.ffn_norm.weight
2024:08:16-15:54:41:376    103:        128 |   128,     1,     1,     1 | F32     | blk.16.attn_k.bias
2024:08:16-15:54:41:376    104:     114688 |   896,   128,     1,     1 | Q5_0    | blk.16.attn_k.weight
2024:08:16-15:54:41:376    105:     802816 |   896,   896,     1,     1 | Q5_0    | blk.16.attn_output.weight
2024:08:16-15:54:41:376    106:        896 |   896,     1,     1,     1 | F32     | blk.16.attn_q.bias
2024:08:16-15:54:41:376    107:     802816 |   896,   896,     1,     1 | Q5_0    | blk.16.attn_q.weight
2024:08:16-15:54:41:377    108:        128 |   128,     1,     1,     1 | F32     | blk.16.attn_v.bias
2024:08:16-15:54:41:377    109:     114688 |   896,   128,     1,     1 | Q8_0    | blk.16.attn_v.weight
2024:08:16-15:54:41:377    110:        896 |   896,     1,     1,     1 | F32     | blk.17.attn_norm.weight
2024:08:16-15:54:41:377    111:    4358144 |  4864,   896,     1,     1 | Q4_K    | blk.17.ffn_down.weight
2024:08:16-15:54:41:377    112:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.17.ffn_gate.weight
2024:08:16-15:54:41:377    113:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.17.ffn_up.weight
2024:08:16-15:54:41:377    114:        896 |   896,     1,     1,     1 | F32     | blk.17.ffn_norm.weight
2024:08:16-15:54:41:377    115:        128 |   128,     1,     1,     1 | F32     | blk.17.attn_k.bias
2024:08:16-15:54:41:377    116:     114688 |   896,   128,     1,     1 | Q5_0    | blk.17.attn_k.weight
2024:08:16-15:54:41:377    117:     802816 |   896,   896,     1,     1 | Q5_0    | blk.17.attn_output.weight
2024:08:16-15:54:41:377    118:        896 |   896,     1,     1,     1 | F32     | blk.17.attn_q.bias
2024:08:16-15:54:41:377    119:     802816 |   896,   896,     1,     1 | Q5_0    | blk.17.attn_q.weight
2024:08:16-15:54:41:377    120:        128 |   128,     1,     1,     1 | F32     | blk.17.attn_v.bias
2024:08:16-15:54:41:377    121:     114688 |   896,   128,     1,     1 | Q5_0    | blk.17.attn_v.weight
2024:08:16-15:54:41:377    122:        896 |   896,     1,     1,     1 | F32     | blk.18.attn_norm.weight
2024:08:16-15:54:41:377    123:    4358144 |  4864,   896,     1,     1 | Q4_K    | blk.18.ffn_down.weight
2024:08:16-15:54:41:377    124:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.18.ffn_gate.weight
2024:08:16-15:54:41:377    125:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.18.ffn_up.weight
2024:08:16-15:54:41:377    126:        896 |   896,     1,     1,     1 | F32     | blk.18.ffn_norm.weight
2024:08:16-15:54:41:377    127:        128 |   128,     1,     1,     1 | F32     | blk.18.attn_k.bias
2024:08:16-15:54:41:377    128:     114688 |   896,   128,     1,     1 | Q5_0    | blk.18.attn_k.weight
2024:08:16-15:54:41:377    129:     802816 |   896,   896,     1,     1 | Q5_0    | blk.18.attn_output.weight
2024:08:16-15:54:41:377    130:        896 |   896,     1,     1,     1 | F32     | blk.18.attn_q.bias
2024:08:16-15:54:41:377    131:     802816 |   896,   896,     1,     1 | Q5_0    | blk.18.attn_q.weight
2024:08:16-15:54:41:377    132:        128 |   128,     1,     1,     1 | F32     | blk.18.attn_v.bias
2024:08:16-15:54:41:378    133:     114688 |   896,   128,     1,     1 | Q5_0    | blk.18.attn_v.weight
2024:08:16-15:54:41:378    134:        896 |   896,     1,     1,     1 | F32     | blk.19.attn_norm.weight
2024:08:16-15:54:41:378    135:    4358144 |  4864,   896,     1,     1 | Q6_K    | blk.19.ffn_down.weight
2024:08:16-15:54:41:378    136:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.19.ffn_gate.weight
2024:08:16-15:54:41:378    137:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.19.ffn_up.weight
2024:08:16-15:54:41:378    138:        896 |   896,     1,     1,     1 | F32     | blk.19.ffn_norm.weight
2024:08:16-15:54:41:378    139:        128 |   128,     1,     1,     1 | F32     | blk.19.attn_k.bias
2024:08:16-15:54:41:378    140:     114688 |   896,   128,     1,     1 | Q5_0    | blk.19.attn_k.weight
2024:08:16-15:54:41:378    141:     802816 |   896,   896,     1,     1 | Q5_0    | blk.19.attn_output.weight
2024:08:16-15:54:41:378    142:        896 |   896,     1,     1,     1 | F32     | blk.19.attn_q.bias
2024:08:16-15:54:41:378    143:     802816 |   896,   896,     1,     1 | Q5_0    | blk.19.attn_q.weight
2024:08:16-15:54:41:378    144:        128 |   128,     1,     1,     1 | F32     | blk.19.attn_v.bias
2024:08:16-15:54:41:378    145:     114688 |   896,   128,     1,     1 | Q8_0    | blk.19.attn_v.weight
2024:08:16-15:54:41:378    146:        896 |   896,     1,     1,     1 | F32     | blk.2.attn_norm.weight
2024:08:16-15:54:41:378    147:    4358144 |  4864,   896,     1,     1 | Q4_K    | blk.2.ffn_down.weight
2024:08:16-15:54:41:378    148:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.2.ffn_gate.weight
2024:08:16-15:54:41:378    149:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.2.ffn_up.weight
2024:08:16-15:54:41:378    150:        896 |   896,     1,     1,     1 | F32     | blk.2.ffn_norm.weight
2024:08:16-15:54:41:378    151:        128 |   128,     1,     1,     1 | F32     | blk.2.attn_k.bias
2024:08:16-15:54:41:378    152:     114688 |   896,   128,     1,     1 | Q5_0    | blk.2.attn_k.weight
2024:08:16-15:54:41:378    153:     802816 |   896,   896,     1,     1 | Q5_0    | blk.2.attn_output.weight
2024:08:16-15:54:41:378    154:        896 |   896,     1,     1,     1 | F32     | blk.2.attn_q.bias
2024:08:16-15:54:41:378    155:     802816 |   896,   896,     1,     1 | Q5_0    | blk.2.attn_q.weight
2024:08:16-15:54:41:378    156:        128 |   128,     1,     1,     1 | F32     | blk.2.attn_v.bias
2024:08:16-15:54:41:378    157:     114688 |   896,   128,     1,     1 | Q5_0    | blk.2.attn_v.weight
2024:08:16-15:54:41:378    158:        896 |   896,     1,     1,     1 | F32     | blk.20.attn_norm.weight
2024:08:16-15:54:41:379    159:    4358144 |  4864,   896,     1,     1 | Q4_K    | blk.20.ffn_down.weight
2024:08:16-15:54:41:379    160:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.20.ffn_gate.weight
2024:08:16-15:54:41:379    161:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.20.ffn_up.weight
2024:08:16-15:54:41:379    162:        896 |   896,     1,     1,     1 | F32     | blk.20.ffn_norm.weight
2024:08:16-15:54:41:379    163:        128 |   128,     1,     1,     1 | F32     | blk.20.attn_k.bias
2024:08:16-15:54:41:379    164:     114688 |   896,   128,     1,     1 | Q5_0    | blk.20.attn_k.weight
2024:08:16-15:54:41:379    165:     802816 |   896,   896,     1,     1 | Q5_0    | blk.20.attn_output.weight
2024:08:16-15:54:41:379    166:        896 |   896,     1,     1,     1 | F32     | blk.20.attn_q.bias
2024:08:16-15:54:41:379    167:     802816 |   896,   896,     1,     1 | Q5_0    | blk.20.attn_q.weight
2024:08:16-15:54:41:379    168:        128 |   128,     1,     1,     1 | F32     | blk.20.attn_v.bias
2024:08:16-15:54:41:379    169:     114688 |   896,   128,     1,     1 | Q5_0    | blk.20.attn_v.weight
2024:08:16-15:54:41:379    170:        896 |   896,     1,     1,     1 | F32     | blk.21.attn_norm.weight
2024:08:16-15:54:41:379    171:    4358144 |  4864,   896,     1,     1 | Q6_K    | blk.21.ffn_down.weight
2024:08:16-15:54:41:379    172:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.21.ffn_gate.weight
2024:08:16-15:54:41:379    173:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.21.ffn_up.weight
2024:08:16-15:54:41:379    174:        896 |   896,     1,     1,     1 | F32     | blk.21.ffn_norm.weight
2024:08:16-15:54:41:379    175:        128 |   128,     1,     1,     1 | F32     | blk.21.attn_k.bias
2024:08:16-15:54:41:379    176:     114688 |   896,   128,     1,     1 | Q5_0    | blk.21.attn_k.weight
2024:08:16-15:54:41:379    177:     802816 |   896,   896,     1,     1 | Q5_0    | blk.21.attn_output.weight
2024:08:16-15:54:41:379    178:        896 |   896,     1,     1,     1 | F32     | blk.21.attn_q.bias
2024:08:16-15:54:41:379    179:     802816 |   896,   896,     1,     1 | Q5_0    | blk.21.attn_q.weight
2024:08:16-15:54:41:379    180:        128 |   128,     1,     1,     1 | F32     | blk.21.attn_v.bias
2024:08:16-15:54:41:379    181:     114688 |   896,   128,     1,     1 | Q8_0    | blk.21.attn_v.weight
2024:08:16-15:54:41:379    182:        896 |   896,     1,     1,     1 | F32     | blk.22.attn_norm.weight
2024:08:16-15:54:41:379    183:    4358144 |  4864,   896,     1,     1 | Q4_K    | blk.22.ffn_down.weight
2024:08:16-15:54:41:380    184:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.22.ffn_gate.weight
2024:08:16-15:54:41:380    185:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.22.ffn_up.weight
2024:08:16-15:54:41:380    186:        896 |   896,     1,     1,     1 | F32     | blk.22.ffn_norm.weight
2024:08:16-15:54:41:380    187:        128 |   128,     1,     1,     1 | F32     | blk.22.attn_k.bias
2024:08:16-15:54:41:380    188:     114688 |   896,   128,     1,     1 | Q5_0    | blk.22.attn_k.weight
2024:08:16-15:54:41:380    189:     802816 |   896,   896,     1,     1 | Q5_0    | blk.22.attn_output.weight
2024:08:16-15:54:41:380    190:        896 |   896,     1,     1,     1 | F32     | blk.22.attn_q.bias
2024:08:16-15:54:41:380    191:     802816 |   896,   896,     1,     1 | Q5_0    | blk.22.attn_q.weight
2024:08:16-15:54:41:380    192:        128 |   128,     1,     1,     1 | F32     | blk.22.attn_v.bias
2024:08:16-15:54:41:380    193:     114688 |   896,   128,     1,     1 | Q5_0    | blk.22.attn_v.weight
2024:08:16-15:54:41:380    194:        896 |   896,     1,     1,     1 | F32     | blk.23.attn_norm.weight
2024:08:16-15:54:41:380    195:    4358144 |  4864,   896,     1,     1 | Q4_K    | blk.23.ffn_down.weight
2024:08:16-15:54:41:380    196:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.23.ffn_gate.weight
2024:08:16-15:54:41:380    197:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.23.ffn_up.weight
2024:08:16-15:54:41:380    198:        896 |   896,     1,     1,     1 | F32     | blk.23.ffn_norm.weight
2024:08:16-15:54:41:380    199:        128 |   128,     1,     1,     1 | F32     | blk.23.attn_k.bias
2024:08:16-15:54:41:380    200:     114688 |   896,   128,     1,     1 | Q5_0    | blk.23.attn_k.weight
2024:08:16-15:54:41:380    201:     802816 |   896,   896,     1,     1 | Q5_0    | blk.23.attn_output.weight
2024:08:16-15:54:41:380    202:        896 |   896,     1,     1,     1 | F32     | blk.23.attn_q.bias
2024:08:16-15:54:41:380    203:     802816 |   896,   896,     1,     1 | Q5_0    | blk.23.attn_q.weight
2024:08:16-15:54:41:380    204:        128 |   128,     1,     1,     1 | F32     | blk.23.attn_v.bias
2024:08:16-15:54:41:380    205:     114688 |   896,   128,     1,     1 | Q5_0    | blk.23.attn_v.weight
2024:08:16-15:54:41:380    206:        896 |   896,     1,     1,     1 | F32     | blk.3.attn_norm.weight
2024:08:16-15:54:41:380    207:    4358144 |  4864,   896,     1,     1 | Q6_K    | blk.3.ffn_down.weight
2024:08:16-15:54:41:380    208:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.3.ffn_gate.weight
2024:08:16-15:54:41:381    209:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.3.ffn_up.weight
2024:08:16-15:54:41:381    210:        896 |   896,     1,     1,     1 | F32     | blk.3.ffn_norm.weight
2024:08:16-15:54:41:381    211:        128 |   128,     1,     1,     1 | F32     | blk.3.attn_k.bias
2024:08:16-15:54:41:381    212:     114688 |   896,   128,     1,     1 | Q5_0    | blk.3.attn_k.weight
2024:08:16-15:54:41:381    213:     802816 |   896,   896,     1,     1 | Q5_0    | blk.3.attn_output.weight
2024:08:16-15:54:41:381    214:        896 |   896,     1,     1,     1 | F32     | blk.3.attn_q.bias
2024:08:16-15:54:41:381    215:     802816 |   896,   896,     1,     1 | Q5_0    | blk.3.attn_q.weight
2024:08:16-15:54:41:381    216:        128 |   128,     1,     1,     1 | F32     | blk.3.attn_v.bias
2024:08:16-15:54:41:381    217:     114688 |   896,   128,     1,     1 | Q8_0    | blk.3.attn_v.weight
2024:08:16-15:54:41:381    218:        896 |   896,     1,     1,     1 | F32     | blk.4.attn_norm.weight
2024:08:16-15:54:41:381    219:    4358144 |  4864,   896,     1,     1 | Q4_K    | blk.4.ffn_down.weight
2024:08:16-15:54:41:381    220:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.4.ffn_gate.weight
2024:08:16-15:54:41:381    221:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.4.ffn_up.weight
2024:08:16-15:54:41:381    222:        896 |   896,     1,     1,     1 | F32     | blk.4.ffn_norm.weight
2024:08:16-15:54:41:381    223:        128 |   128,     1,     1,     1 | F32     | blk.4.attn_k.bias
2024:08:16-15:54:41:381    224:     114688 |   896,   128,     1,     1 | Q5_0    | blk.4.attn_k.weight
2024:08:16-15:54:41:381    225:     802816 |   896,   896,     1,     1 | Q5_0    | blk.4.attn_output.weight
2024:08:16-15:54:41:381    226:        896 |   896,     1,     1,     1 | F32     | blk.4.attn_q.bias
2024:08:16-15:54:41:381    227:     802816 |   896,   896,     1,     1 | Q5_0    | blk.4.attn_q.weight
2024:08:16-15:54:41:381    228:        128 |   128,     1,     1,     1 | F32     | blk.4.attn_v.bias
2024:08:16-15:54:41:382    229:     114688 |   896,   128,     1,     1 | Q5_0    | blk.4.attn_v.weight
2024:08:16-15:54:41:382    230:        896 |   896,     1,     1,     1 | F32     | blk.5.attn_norm.weight
2024:08:16-15:54:41:382    231:    4358144 |  4864,   896,     1,     1 | Q4_K    | blk.5.ffn_down.weight
2024:08:16-15:54:41:382    232:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.5.ffn_gate.weight
2024:08:16-15:54:41:382    233:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.5.ffn_up.weight
2024:08:16-15:54:41:382    234:        896 |   896,     1,     1,     1 | F32     | blk.5.ffn_norm.weight
2024:08:16-15:54:41:382    235:        128 |   128,     1,     1,     1 | F32     | blk.5.attn_k.bias
2024:08:16-15:54:41:382    236:     114688 |   896,   128,     1,     1 | Q5_0    | blk.5.attn_k.weight
2024:08:16-15:54:41:382    237:     802816 |   896,   896,     1,     1 | Q5_0    | blk.5.attn_output.weight
2024:08:16-15:54:41:382    238:        896 |   896,     1,     1,     1 | F32     | blk.5.attn_q.bias
2024:08:16-15:54:41:382    239:     802816 |   896,   896,     1,     1 | Q5_0    | blk.5.attn_q.weight
2024:08:16-15:54:41:382    240:        128 |   128,     1,     1,     1 | F32     | blk.5.attn_v.bias
2024:08:16-15:54:41:382    241:     114688 |   896,   128,     1,     1 | Q5_0    | blk.5.attn_v.weight
2024:08:16-15:54:41:382    242:        896 |   896,     1,     1,     1 | F32     | blk.6.attn_norm.weight
2024:08:16-15:54:41:382    243:    4358144 |  4864,   896,     1,     1 | Q6_K    | blk.6.ffn_down.weight
2024:08:16-15:54:41:382    244:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.6.ffn_gate.weight
2024:08:16-15:54:41:382    245:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.6.ffn_up.weight
2024:08:16-15:54:41:382    246:        896 |   896,     1,     1,     1 | F32     | blk.6.ffn_norm.weight
2024:08:16-15:54:41:382    247:        128 |   128,     1,     1,     1 | F32     | blk.6.attn_k.bias
2024:08:16-15:54:41:382    248:     114688 |   896,   128,     1,     1 | Q5_0    | blk.6.attn_k.weight
2024:08:16-15:54:41:382    249:     802816 |   896,   896,     1,     1 | Q5_0    | blk.6.attn_output.weight
2024:08:16-15:54:41:382    250:        896 |   896,     1,     1,     1 | F32     | blk.6.attn_q.bias
2024:08:16-15:54:41:382    251:     802816 |   896,   896,     1,     1 | Q5_0    | blk.6.attn_q.weight
2024:08:16-15:54:41:382    252:        128 |   128,     1,     1,     1 | F32     | blk.6.attn_v.bias
2024:08:16-15:54:41:382    253:     114688 |   896,   128,     1,     1 | Q8_0    | blk.6.attn_v.weight
2024:08:16-15:54:41:382    254:        896 |   896,     1,     1,     1 | F32     | blk.7.attn_norm.weight
2024:08:16-15:54:41:383    255:    4358144 |  4864,   896,     1,     1 | Q6_K    | blk.7.ffn_down.weight
2024:08:16-15:54:41:383    256:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.7.ffn_gate.weight
2024:08:16-15:54:41:383    257:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.7.ffn_up.weight
2024:08:16-15:54:41:383    258:        896 |   896,     1,     1,     1 | F32     | blk.7.ffn_norm.weight
2024:08:16-15:54:41:383    259:        128 |   128,     1,     1,     1 | F32     | blk.7.attn_k.bias
2024:08:16-15:54:41:383    260:     114688 |   896,   128,     1,     1 | Q5_0    | blk.7.attn_k.weight
2024:08:16-15:54:41:383    261:     802816 |   896,   896,     1,     1 | Q5_0    | blk.7.attn_output.weight
2024:08:16-15:54:41:383    262:        896 |   896,     1,     1,     1 | F32     | blk.7.attn_q.bias
2024:08:16-15:54:41:383    263:     802816 |   896,   896,     1,     1 | Q5_0    | blk.7.attn_q.weight
2024:08:16-15:54:41:383    264:        128 |   128,     1,     1,     1 | F32     | blk.7.attn_v.bias
2024:08:16-15:54:41:384    265:     114688 |   896,   128,     1,     1 | Q8_0    | blk.7.attn_v.weight
2024:08:16-15:54:41:384    266:        896 |   896,     1,     1,     1 | F32     | blk.8.attn_norm.weight
2024:08:16-15:54:41:384    267:    4358144 |  4864,   896,     1,     1 | Q6_K    | blk.8.ffn_down.weight
2024:08:16-15:54:41:384    268:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.8.ffn_gate.weight
2024:08:16-15:54:41:384    269:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.8.ffn_up.weight
2024:08:16-15:54:41:385    270:        896 |   896,     1,     1,     1 | F32     | blk.8.ffn_norm.weight
2024:08:16-15:54:41:385    271:        128 |   128,     1,     1,     1 | F32     | blk.8.attn_k.bias
2024:08:16-15:54:41:385    272:     114688 |   896,   128,     1,     1 | Q5_0    | blk.8.attn_k.weight
2024:08:16-15:54:41:385    273:     802816 |   896,   896,     1,     1 | Q5_0    | blk.8.attn_output.weight
2024:08:16-15:54:41:385    274:        896 |   896,     1,     1,     1 | F32     | blk.8.attn_q.bias
2024:08:16-15:54:41:385    275:     802816 |   896,   896,     1,     1 | Q5_0    | blk.8.attn_q.weight
2024:08:16-15:54:41:385    276:        128 |   128,     1,     1,     1 | F32     | blk.8.attn_v.bias
2024:08:16-15:54:41:386    277:     114688 |   896,   128,     1,     1 | Q8_0    | blk.8.attn_v.weight
2024:08:16-15:54:41:386    278:        896 |   896,     1,     1,     1 | F32     | blk.9.attn_norm.weight
2024:08:16-15:54:41:386    279:    4358144 |  4864,   896,     1,     1 | Q6_K    | blk.9.ffn_down.weight
2024:08:16-15:54:41:386    280:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.9.ffn_gate.weight
2024:08:16-15:54:41:386    281:    4358144 |   896,  4864,     1,     1 | Q5_0    | blk.9.ffn_up.weight
2024:08:16-15:54:41:386    282:        896 |   896,     1,     1,     1 | F32     | blk.9.ffn_norm.weight
2024:08:16-15:54:41:387    283:        128 |   128,     1,     1,     1 | F32     | blk.9.attn_k.bias
2024:08:16-15:54:41:387    284:     114688 |   896,   128,     1,     1 | Q5_0    | blk.9.attn_k.weight
2024:08:16-15:54:41:387    285:     802816 |   896,   896,     1,     1 | Q5_0    | blk.9.attn_output.weight
2024:08:16-15:54:41:387    286:        896 |   896,     1,     1,     1 | F32     | blk.9.attn_q.bias
2024:08:16-15:54:41:387    287:     802816 |   896,   896,     1,     1 | Q5_0    | blk.9.attn_q.weight
2024:08:16-15:54:41:387    288:        128 |   128,     1,     1,     1 | F32     | blk.9.attn_v.bias
2024:08:16-15:54:41:387    289:     114688 |   896,   128,     1,     1 | Q8_0    | blk.9.attn_v.weight
2024:08:16-15:54:41:387    290:        896 |   896,     1,     1,     1 | F32     | output_norm.weight

FranzKafkaYu Aug 16, 2024

@kaleid-liner is STEP.3 is need even I already have a gguf file which is downloaded from Huggingface?

kaleid-liner Aug 16, 2024

Yes, it's still needed. BTW, I'm writing utils to auto detect kernel shapes for any models in GPTQ format. So I recommend you to wait one or two days for that.

FranzKafkaYu Aug 19, 2024

Yes, it's still needed. BTW, I'm writing utils to auto detect kernel shapes for any models in GPTQ format. So I recommend you to wait one or two days for that.

that's awesome,I have follwed your steps but still failed.So I am glad to see your new updates.is it ready for now?

FranzKafkaYu · 2024-08-19T00:59:54Z

FranzKafkaYu
Aug 19, 2024

I checked your commit history it seems you have done these gread jobs,I will make a test and if there are problems I will report.

0 replies

kaleid-liner · 2024-08-19T04:58:57Z

kaleid-liner
Aug 19, 2024

I have added support for other models in gptq format. Check the latest news. And I suppose you are using qwen2 instead of qwen. Qwen is still unsupported by the convert script yet.

3 replies

FranzKafkaYu Aug 19, 2024

I have added support for other models in gptq format. Check the latest news. And I suppose you are using qwen2 instead of qwen. Qwen is still unsupported by the convert script yet.

yeah,I am using qwen2 not qwen

FranzKafkaYu Aug 20, 2024

@kaleid-liner is it right that I should execute SETP0-STEP5 sequently if I want support Qwen2 while not these preset models?But I failed in the first step:

 python compile.py -o tuned -da -nt 4 -tb -gc -gs 128 -ags 64 -t -m gptq-auto -r
Traceback (most recent call last):
  File "/home/franzkafka/Desktop/tmac/T-MAC/T-MAC/deploy/compile.py", line 240, in <module>
    main()
  File "/home/franzkafka/Desktop/tmac/T-MAC/T-MAC/deploy/compile.py", line 230, in main
    compile(**device_kwargs)
  File "/home/franzkafka/Desktop/tmac/T-MAC/T-MAC/deploy/compile.py", line 91, in compile
    kernel_shapes = extract_kernel_shapes(FLAGS.preset_model, FLAGS.model_dir)
  File "/home/franzkafka/Desktop/tmac/T-MAC/T-MAC/python/t_mac/model_utils.py", line 190, in extract_kernel_shapes
    return _Model(Path(model_dir)).extract_kernel_shapes()
  File "/usr/lib/python3.10/pathlib.py", line 960, in __new__
    self = cls._from_parts(args)
  File "/usr/lib/python3.10/pathlib.py", line 594, in _from_parts
    drv, root, parts = self._parse_args(args)
  File "/usr/lib/python3.10/pathlib.py", line 578, in _parse_args
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

Version info:

python -c "import t_mac; print(t_mac.__version__); from tvm.contrib.clang import find_clang; print(find_clang())"
1.0.0a3
['/home/franzkafka/Desktop/tmac/T-MAC/T-MAC/build/clang+llvm-17.0.6-x86_64-linux-gnu-ubuntu-22.04/bin/clang-17', '/home/franzkafka/Desktop/tmac/T-MAC/T-MAC/build/clang+llvm-17.0.6-x86_64-linux-gnu-ubuntu-22.04/bin/clang']

kaleid-liner Aug 20, 2024

@FranzKafkaYu You need to add -md argument. If you only want to run certain steps, you can still use run_pipeline.py by -s 0,1,2,3,4 to avoid the errors.

How can I apply T-MAC to other models？ #15

Uh oh!

Replies: 6 comments · 8 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 6 comments 8 replies