-
Notifications
You must be signed in to change notification settings - Fork 14.8k
tests : add non-cont, inplace rope tests #19296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Yeah, the new tests are failing in the vulkan backend. I'll look into it. |
|
Thank you. If it works I will close #19292 |
Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
|
Huh, adding the dim 3 somehow hided the CUDA failures from earlier: https://github.com/ggml-org/llama.cpp/actions/runs/21636838841/job/62364670686#step:3:48919 I guess I have to add both tests for |
|
Hi all, I can confirm that Jeff's patch fixes #19292 for me on Vulkan! Works fine and fixes the incoherence-after-shifting regression on GLM-4-32B-0414 and GLM-4.7-Flash. Much appreciated to all. However is anyone able to check this test case is good for CUDA too? I have heard that a similar issue affected a CUDA RTX 5xxx user on GLM-4.7-Flash, would be nice if someone could confirm the test passes on CUDA. |
|
It's very likely broken with CUDA in a similar way - see the test failure from earlier: https://github.com/ggml-org/llama.cpp/actions/runs/21636838841/job/62364670686#step:3:48919 Once we stabilize the tests, we'll notify CUDA maintainers to take a look. |
|
Thanks! Once there is a fix for cuda i'll let the cuda user try it out again. |
|
@JohannesGaessler Note that some of the newly added rope tests are currently failing with CUDA: https://github.com/ggml-org/llama.cpp/actions/runs/21664340312/job/62455946368 |
ref #18986 (comment)
ref #19128 (comment)
ref #19292
This type of rope is needed during rope shift:
llama.cpp/src/llama-kv-cache.cpp
Lines 1626 to 1635 in cf36c21
We are updating the KV cache data, so the op is inplace. Since the changes in #18986 the tensor is now also non-contiguous.
Based on the discussion in #18986 (comment), it appears that the Vulkan backend currently does not support this case. cc @jeffbolznv @0cc4m