GEMMTestSuite: perform input data generation on GPU (incl. hiprand) #417

matthiasdiener · 2026-01-16T16:55:21Z

Description

Use hiprand random number generation (instead of CPU/OpenMP).

Partly addresses https://github.com/ROCm/frameworks-internal/issues/14746.

Notes:

The generated numbers are deterministic across executions
The generated numbers are not the same as the ones generated via the previous CPU implementation
Performance: https://github.com/ROCm/frameworks-internal/issues/14746#issuecomment-3761360289

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Use hiprand random number generation (instead of CPU/OpenMP).

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

alextmagro

Awesome work! Do you have any rough estimates of how much faster the entire cpp test suite is?

tests/cpp/test_common.cu

matthiasdiener · 2026-01-16T21:18:54Z

Awesome work! Do you have any rough estimates of how much faster the entire cpp test suite is?

Thanks! I left some performance numbers here: https://github.com/ROCm/frameworks-internal/issues/14746#issuecomment-3761360289

ipanfilo · 2026-01-17T03:19:33Z

Update copyright date of modified files

tests/cpp/test_common.cu

ipanfilo · 2026-01-17T05:23:31Z

tests/cpp/test_common.cu

    TRANSFORMER_ENGINE_TYPE_SWITCH_ALL(t->dtype(), T,
      {
+#ifdef __HIP_PLATFORM_AMD__
+        fillUniformDevice(t);


Is there any test that tests this generation? I think using GPU generation here does not produce correct result because of using t->from_cpu() below in this method

Good catch, thanks. I disabled the from_cpu call here, and added an explicit copy of the data to CPU in 0f008e9, as there are other places that copy from CPU->GPU (like set_scale_inv). I also added a new test to check that GPU and CPU copies are the same in 097ecd4.

tests/cpp/test_common.cu

ipanfilo · 2026-01-17T05:28:40Z

tests/cpp/test_common.cu

+  rocrand_generate_uniform(gen, tmp, N);
+
+  // map to [-2.0, 1.0] (like generate_data_uniformly) and cast into tensor dtype
+  TRANSFORMER_ENGINE_TYPE_SWITCH_ALL(t->dtype(), T, {


T should either be template parameter and no TRANSFORMER_ENGINE_TYPE_SWITCH_ALL here, or the method calling should be moved out of TRANSFORMER_ENGINE_TYPE_SWITCH_ALL in fillUniform

With the restructuring in bdb8349, I believe this comment is now addressed?

Seems not. fillUniformTensorDevice is still called from t->dtype() switch and in turn calls fillUniformLinearBuferDevice from t->dtype() switch

You're right. What do you think of 33f6124?

tests/cpp/test_common.cu

curand is already used in other places in TE.

alextmagro

LGTM! A few small comments, but otherwise excited to see the benefits for our CI.

tests/cpp/test_common.cu

wangye805

LGTM

ipanfilo · 2026-01-30T15:45:49Z

tests/cpp/test_common.cu

+  rocrand_generate_uniform(gen, tmp, N);
+
+  // map to [-2.0, 1.0] (like generate_data_uniformly) and cast into tensor dtype
+  TRANSFORMER_ENGINE_TYPE_SWITCH_ALL(t->dtype(), T, {


Seems not. fillUniformTensorDevice is still called from t->dtype() switch and in turn calls fillUniformLinearBuferDevice from t->dtype() switch

GEMMTestSuite: use rocrand for input data generation

14e3b75

matthiasdiener requested a review from alextmagro January 16, 2026 16:55

matthiasdiener self-assigned this Jan 16, 2026

adjust comments

0d4d62f

matthiasdiener marked this pull request as ready for review January 16, 2026 18:37

matthiasdiener requested review from ipanfilo, wangye805 and wenchenvincent as code owners January 16, 2026 18:37

alextmagro reviewed Jan 16, 2026

View reviewed changes

tests/cpp/test_common.cu Show resolved Hide resolved

tests/cpp/test_common.cu Show resolved Hide resolved

ipanfilo reviewed Jan 17, 2026

View reviewed changes

matthiasdiener added 4 commits January 19, 2026 10:52

Merge branch 'dev' into gemmtestsuite-rocrand

f67ef5d

skip copying to device

3f10ed3

move include, use hipify more, fix CPU copy

0f008e9

Merge remote-tracking branch 'origin/dev' into gemmtestsuite-rocrand

64b0d8e

wangye805 requested changes Jan 20, 2026

View reviewed changes

tests/cpp/test_common.cu Outdated Show resolved Hide resolved

tests/cpp/test_common.cu Outdated Show resolved Hide resolved

remove now-superfluous AMD code and disable generate_data_uniformly

4a4d138

matthiasdiener force-pushed the gemmtestsuite-rocrand branch from 090aa75 to 4a4d138 Compare January 20, 2026 16:15

split fill function into linear+frontend

6a89a41

matthiasdiener force-pushed the gemmtestsuite-rocrand branch from bdb8349 to 6a89a41 Compare January 20, 2026 21:49

also offload fillCase_special

b6eee81

matthiasdiener changed the title ~~GEMMTestSuite: use rocrand for input data generation~~ GEMMTestSuite: perform input data generation on GPU (incl. rocrand) Jan 21, 2026

matthiasdiener added 2 commits January 21, 2026 10:59

Merge remote-tracking branch 'origin/dev' into gemmtestsuite-rocrand

066ae7e

move to curand/hiprand

0ff7067

curand is already used in other places in TE.

matthiasdiener force-pushed the gemmtestsuite-rocrand branch 2 times, most recently from cfb4b0d to cff44ef Compare January 21, 2026 20:24

add test for correct GPU->CPU mirroring

097ecd4

matthiasdiener force-pushed the gemmtestsuite-rocrand branch from cff44ef to 097ecd4 Compare January 21, 2026 20:28

matthiasdiener requested a review from ipanfilo January 21, 2026 21:00

matthiasdiener requested review from alextmagro and wangye805 January 21, 2026 21:00

matthiasdiener changed the title ~~GEMMTestSuite: perform input data generation on GPU (incl. rocrand)~~ GEMMTestSuite: perform input data generation on GPU (incl. hiprand) Jan 21, 2026

alextmagro approved these changes Jan 21, 2026

View reviewed changes

tests/cpp/test_common.cu Outdated Show resolved Hide resolved

tests/cpp/test_common.cu Outdated Show resolved Hide resolved

remove extra __ifdef__

dfd51e1

wangye805 requested changes Jan 22, 2026

View reviewed changes

tests/cpp/test_common.cu Show resolved Hide resolved

tests/cpp/test_common.cu Outdated Show resolved Hide resolved

fuse signs and transform kernels

ddfcf2d

wangye805 approved these changes Jan 22, 2026

View reviewed changes

matthiasdiener added 2 commits January 27, 2026 13:06

Merge branch 'dev' into gemmtestsuite-rocrand

7bcc7ba

Merge remote-tracking branch 'origin/dev' into gemmtestsuite-rocrand

9fe65f8

ipanfilo requested changes Jan 30, 2026

View reviewed changes

matthiasdiener added 2 commits January 30, 2026 10:44

Merge remote-tracking branch 'origin/dev' into gemmtestsuite-rocrand

9300cdc

clean up type switches

33f6124

GEMMTestSuite: perform input data generation on GPU (incl. hiprand) #417

Are you sure you want to change the base?

GEMMTestSuite: perform input data generation on GPU (incl. hiprand) #417

Uh oh!

Conversation

matthiasdiener commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

alextmagro left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

matthiasdiener commented Jan 16, 2026

Uh oh!

ipanfilo commented Jan 17, 2026

Uh oh!

Uh oh!

ipanfilo Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

matthiasdiener Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ipanfilo Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

matthiasdiener Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

ipanfilo Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

matthiasdiener Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

alextmagro left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wangye805 left a comment

Choose a reason for hiding this comment

Uh oh!

ipanfilo Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

matthiasdiener commented Jan 16, 2026 •

edited

Loading