-
Notifications
You must be signed in to change notification settings - Fork 54
feat[dace][next]: Adjusted the handling of non standard GPU Memlets. #1913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat[dace][next]: Adjusted the handling of non standard GPU Memlets. #1913
Conversation
Once [PR#1976](spcl/dace#1976) is merged in DaCe the code generator is able to handle more Memlets directly as Cuda `memcpy()` calls. This PR modifies the GPU transformation of GT4Py in such a way that these Memlets are no longer transformed into Maps. However, it can only be merged if the DaCe dependency was bumped to a version that includes PR#1976!
src/gt4py/next/program_processors/runners/dace/transformations/gpu_utils.py
Show resolved
Hide resolved
edopao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GPU tests pass, although we are on a dace baseline that does not include the PR mentioned in the PR description. I suspect this is because gt4py (and icon4py as well) use symbolic field strides, by default. If this is the case, is it safe to merge this PR?
We have to expand some Memlets into copy Maps, because there is no suitable call to `cudaMemcpy*`, because we have to set their order right. But this is a gross hack, it should be solved once [PR#1913](GridTools#1913) has been merged, but for that DaCe [PR#1976](spcl/dace#1976) has to be merged. The error was that there was the assumption that at least one of these newly created Maps survived. However, this does not seem to be the case, depending in which order MapFusion is applied. Thus the assert was removed. We now assume that if we do not find a newly created MapEntry that is was integrated into a Map that was already present, we further assume that the Map parameter of that Map were used and that they are correct.
We have to expand some Memlets into copy Maps, because there is no suitable call to `cudaMemcpy*`, because we have to set their order right. But this is a gross hack, it should be solved once [PR#1913](#1913) has been merged, but for that DaCe [PR#1976](spcl/dace#1976) has to be merged. The error was that there was the assumption that at least one of these newly created Maps survived. However, this does not seem to be the case, depending in which order MapFusion is applied. Thus the assert was removed. We now assume that if we do not find a newly created MapEntry that is was integrated into a Map that was already present, we further assume that the Map parameter of that Map were used and that they are correct.
|
This PR can merged, since the dace source now points to the GridTools fork repo. |
…2004) We have to expand some Memlets into copy Maps, because there is no suitable call to `cudaMemcpy*`, because we have to set their order right. But this is a gross hack, it should be solved once [PR#1913](GridTools#1913) has been merged, but for that DaCe [PR#1976](spcl/dace#1976) has to be merged. The error was that there was the assumption that at least one of these newly created Maps survived. However, this does not seem to be the case, depending in which order MapFusion is applied. Thus the assert was removed. We now assume that if we do not find a newly created MapEntry that is was integrated into a Map that was already present, we further assume that the Map parameter of that Map were used and that they are correct.
…ridTools#1913) Due to the update to DaCe in [PR#1976](spcl/dace#1976) the code generator is now able to handle more Memlets directly as Cuda `memcpy()` calls. Thus we have to change this function to reflect this case. It even stands to reason that these functions can be removed altogether. Note that the merge has become possible because of our switch to our DaCe fork in [PR#2012](GridTools#2012).
Due to the update to DaCe in PR#1976 the code generator is now able to handle more Memlets directly as Cuda
memcpy()calls. Thus we have to change this function to reflect this case.It even stands to reason that these functions can be removed altogether.
Note that the merge has become possible because of our switch to our DaCe fork in PR#2012.