vulkan: Set k_load_shmem to false when K is too large #19301

jeffbolznv · 2026-02-03T19:21:49Z

See https://github.com/ggml-org/llama.cpp/pull/19075/changes#r2726146004.

Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo>llama-bench.exe -fa 1 -m c:\models\GLM-4.7-Flash-Q4_K_M.gguf -p 512 -n 128 -d 0,4096,16384
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 5090 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| deepseek2 30B.A3B Q4_K - Medium |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |           pp512 |      4434.49 ± 44.89 |
| deepseek2 30B.A3B Q4_K - Medium |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |           tg128 |        231.50 ± 0.39 |
| deepseek2 30B.A3B Q4_K - Medium |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |   pp512 @ d4096 |        907.02 ± 0.94 |
| deepseek2 30B.A3B Q4_K - Medium |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |   tg128 @ d4096 |        192.08 ± 0.64 |
| deepseek2 30B.A3B Q4_K - Medium |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |  pp512 @ d16384 |        261.88 ± 0.30 |
| deepseek2 30B.A3B Q4_K - Medium |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |  tg128 @ d16384 |        118.39 ± 0.21 |

k_load_shmem after

Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo>llama-bench.exe -fa 1 -m c:\models\GLM-4.7-Flash-Q4_K_M.gguf -p 512 -n 128 -d 0,4096,16384
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 5090 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| deepseek2 30B.A3B Q4_K - Medium |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |           pp512 |      5645.01 ± 46.79 |
| deepseek2 30B.A3B Q4_K - Medium |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |           tg128 |        230.90 ± 0.37 |
| deepseek2 30B.A3B Q4_K - Medium |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |   pp512 @ d4096 |      2941.50 ± 12.58 |
| deepseek2 30B.A3B Q4_K - Medium |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |   tg128 @ d4096 |        192.27 ± 0.35 |
| deepseek2 30B.A3B Q4_K - Medium |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |  pp512 @ d16384 |       1206.25 ± 2.11 |
| deepseek2 30B.A3B Q4_K - Medium |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |  tg128 @ d16384 |        118.67 ± 0.27 |

vulkan: Set k_load_shmem to false when K is too large

3f73fe8

jeffbolznv requested a review from 0cc4m as a code owner February 3, 2026 19:21

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Feb 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vulkan: Set k_load_shmem to false when K is too large #19301

vulkan: Set k_load_shmem to false when K is too large #19301

jeffbolznv commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vulkan: Set k_load_shmem to false when K is too large #19301

Are you sure you want to change the base?

vulkan: Set k_load_shmem to false when K is too large #19301

Conversation

jeffbolznv commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant