-
Notifications
You must be signed in to change notification settings - Fork 23
GEMMTestSuite: perform input data generation on GPU (incl. hiprand) #417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
alextmagro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work! Do you have any rough estimates of how much faster the entire cpp test suite is?
Thanks! I left some performance numbers here: https://github.com/ROCm/frameworks-internal/issues/14746#issuecomment-3761360289 |
|
Update copyright date of modified files |
tests/cpp/test_common.cu
Outdated
| TRANSFORMER_ENGINE_TYPE_SWITCH_ALL(t->dtype(), T, | ||
| { | ||
| #ifdef __HIP_PLATFORM_AMD__ | ||
| fillUniformDevice(t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any test that tests this generation? I think using GPU generation here does not produce correct result because of using t->from_cpu() below in this method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/cpp/test_common.cu
Outdated
| rocrand_generate_uniform(gen, tmp, N); | ||
|
|
||
| // map to [-2.0, 1.0] (like generate_data_uniformly) and cast into tensor dtype | ||
| TRANSFORMER_ENGINE_TYPE_SWITCH_ALL(t->dtype(), T, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
T should either be template parameter and no TRANSFORMER_ENGINE_TYPE_SWITCH_ALL here, or the method calling should be moved out of TRANSFORMER_ENGINE_TYPE_SWITCH_ALL in fillUniform
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the restructuring in bdb8349, I believe this comment is now addressed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems not. fillUniformTensorDevice is still called from t->dtype() switch and in turn calls fillUniformLinearBuferDevice from t->dtype() switch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. What do you think of 33f6124?
090aa75 to
4a4d138
Compare
bdb8349 to
6a89a41
Compare
curand is already used in other places in TE.
cfb4b0d to
cff44ef
Compare
cff44ef to
097ecd4
Compare
alextmagro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! A few small comments, but otherwise excited to see the benefits for our CI.
wangye805
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
tests/cpp/test_common.cu
Outdated
| rocrand_generate_uniform(gen, tmp, N); | ||
|
|
||
| // map to [-2.0, 1.0] (like generate_data_uniformly) and cast into tensor dtype | ||
| TRANSFORMER_ENGINE_TYPE_SWITCH_ALL(t->dtype(), T, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems not. fillUniformTensorDevice is still called from t->dtype() switch and in turn calls fillUniformLinearBuferDevice from t->dtype() switch
Description
Use
hiprandrandom number generation (instead of CPU/OpenMP).Partly addresses https://github.com/ROCm/frameworks-internal/issues/14746.
Notes:
Type of change
Changes
Please list the changes introduced in this PR:
hiprandrandom number generation (instead of CPU/OpenMP).Checklist: