Skip to content

Conversation

@jeffbolznv
Copy link
Collaborator

See https://github.com/ggml-org/llama.cpp/pull/19075/changes#r2726146004.

Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo>llama-bench.exe -fa 1 -m c:\models\GLM-4.7-Flash-Q4_K_M.gguf -p 512 -n 128 -d 0,4096,16384
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 5090 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| deepseek2 30B.A3B Q4_K - Medium |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |           pp512 |      4434.49 ± 44.89 |
| deepseek2 30B.A3B Q4_K - Medium |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |           tg128 |        231.50 ± 0.39 |
| deepseek2 30B.A3B Q4_K - Medium |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |   pp512 @ d4096 |        907.02 ± 0.94 |
| deepseek2 30B.A3B Q4_K - Medium |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |   tg128 @ d4096 |        192.08 ± 0.64 |
| deepseek2 30B.A3B Q4_K - Medium |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |  pp512 @ d16384 |        261.88 ± 0.30 |
| deepseek2 30B.A3B Q4_K - Medium |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |  tg128 @ d16384 |        118.39 ± 0.21 |

k_load_shmem after

Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo>llama-bench.exe -fa 1 -m c:\models\GLM-4.7-Flash-Q4_K_M.gguf -p 512 -n 128 -d 0,4096,16384
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 5090 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| deepseek2 30B.A3B Q4_K - Medium |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |           pp512 |      5645.01 ± 46.79 |
| deepseek2 30B.A3B Q4_K - Medium |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |           tg128 |        230.90 ± 0.37 |
| deepseek2 30B.A3B Q4_K - Medium |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |   pp512 @ d4096 |      2941.50 ± 12.58 |
| deepseek2 30B.A3B Q4_K - Medium |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |   tg128 @ d4096 |        192.27 ± 0.35 |
| deepseek2 30B.A3B Q4_K - Medium |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |  pp512 @ d16384 |       1206.25 ± 2.11 |
| deepseek2 30B.A3B Q4_K - Medium |  16.88 GiB |    29.94 B | Vulkan     |  99 |  1 |  tg128 @ d16384 |        118.67 ± 0.27 |

@jeffbolznv jeffbolznv requested a review from 0cc4m as a code owner February 3, 2026 19:21
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant