Skip to content

Conversation

@fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Dec 13, 2025

This release implements a new interface, similar to std::transform, that simplifies writing asynchronous parallel algorithms across all back-ends. SYCL support is extended to NVIDIA and AMD GPUs. The release introduces unified memory and expands asynchronous memory allocation to buffers of any dimension. Interoperability with standard C++ is improved through std::span support: alpaka buffers expose a span interface, and any std::span can be used as an alpaka view. It adds compile-time warp-size definitions, extends atomic increment and decrement operations and fixes their behaviour on CPU back-end; it introduces a C++ concept for alpaka accelerators together with new type traits, along with many smaller fixes and improvements.
The CI has been updated to test newer operating systems and compilers, including Clang 20 and ROCm 6.3, 6.4, and 7.0.

The full list of changes is available in the ChangeLog.

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 13, 2025

enable gpu

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 13, 2025

please test

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @fwyzard for branch IB/CMSSW_16_0_X/master.

@akritkbehera, @iarspider, @raoatifshad, @smuzaffar can you please review it and eventually sign? Thanks.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.
cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 13, 2025

cms-bot internal usage

@cmsbuild
Copy link
Contributor

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8e0c53/49947/summary.html
COMMIT: c14809b
CMSSW: CMSSW_16_0_X_2025-12-12-2300/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/10250/49947/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed External Build

I found compilation warning when building: See details on the summary page.

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 13, 2025

please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: Build
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8e0c53/49948/summary.html
COMMIT: c14809b
CMSSW: CMSSW_16_0_X_2025-12-12-2300/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/10250/49948/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed Build

I found compilation error when building:

Copying tmp/el8_amd64_gcc13/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestAtomicPairCounterROCmAsync/libalpakaTestAtomicPairCounterROCmAsync_rocm.a to productstore area:
Copying tmp/el8_amd64_gcc13/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestBufferROCmAsync/libalpakaTestBufferROCmAsync_rocm.a to productstore area:
Copying tmp/el8_amd64_gcc13/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestHistoContainerROCmAsync/libalpakaTestHistoContainerROCmAsync_rocm.a to productstore area:
cp: cannot stat 'tmp/el8_amd64_gcc13/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestHistoContainerROCmAsync/libalpakaTestHistoContainerROCmAsync_rocm.a': No such file or directory
>> Deleted: tmp/el8_amd64_gcc13/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestHistoContainerROCmAsync/libalpakaTestHistoContainerROCmAsync_rocm.a
gmake: *** [config/SCRAM/GMake/Makefile.rules:1920: tmp/el8_amd64_gcc13/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestHistoContainerROCmAsync/libalpakaTestHistoContainerROCmAsync_rocm.a] Error 1
Copying tmp/el8_amd64_gcc13/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestIndependentKernelROCmAsync/libalpakaTestIndependentKernelROCmAsync_rocm.a to productstore area:
Copying tmp/el8_amd64_gcc13/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestKernelROCmAsync/libalpakaTestKernelROCmAsync_rocm.a to productstore area:
Copying tmp/el8_amd64_gcc13/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestOneHistoContainerROCmAsync/libalpakaTestOneHistoContainerROCmAsync_rocm.a to productstore area:
Copying tmp/el8_amd64_gcc13/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestOneRadixSortROCmAsync/libalpakaTestOneRadixSortROCmAsync_rocm.a to productstore area:
Copying tmp/el8_amd64_gcc13/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestOneToManyAssocROCmAsync/libalpakaTestOneToManyAssocROCmAsync_rocm.a to productstore area:


@cmsbuild
Copy link
Contributor

Pull request #10250 was updated.

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 13, 2025

please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: Build
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8e0c53/49963/summary.html
COMMIT: 8c4996c
CMSSW: CMSSW_16_0_X_2025-12-13-1100/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/10250/49963/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed Build

I found compilation error when building:

Copying tmp/el8_amd64_gcc13/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestBufferROCmAsync/libalpakaTestBufferROCmAsync_rocm.a to productstore area:
Copying tmp/el8_amd64_gcc13/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestHistoContainerROCmAsync/libalpakaTestHistoContainerROCmAsync_rocm.a to productstore area:
Copying tmp/el8_amd64_gcc13/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestIndependentKernelROCmAsync/libalpakaTestIndependentKernelROCmAsync_rocm.a to productstore area:
cp: cannot stat 'tmp/el8_amd64_gcc13/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestHistoContainerROCmAsync/libalpakaTestHistoContainerROCmAsync_rocm.a': No such file or directory
>> Deleted: tmp/el8_amd64_gcc13/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestHistoContainerROCmAsync/libalpakaTestHistoContainerROCmAsync_rocm.a
gmake: *** [config/SCRAM/GMake/Makefile.rules:1920: tmp/el8_amd64_gcc13/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestHistoContainerROCmAsync/libalpakaTestHistoContainerROCmAsync_rocm.a] Error 1
Copying tmp/el8_amd64_gcc13/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestKernelROCmAsync/libalpakaTestKernelROCmAsync_rocm.a to productstore area:
Copying tmp/el8_amd64_gcc13/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestOneHistoContainerROCmAsync/libalpakaTestOneHistoContainerROCmAsync_rocm.a to productstore area:
Copying tmp/el8_amd64_gcc13/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestOneRadixSortROCmAsync/libalpakaTestOneRadixSortROCmAsync_rocm.a to productstore area:
Copying tmp/el8_amd64_gcc13/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestOneToManyAssocROCmAsync/libalpakaTestOneToManyAssocROCmAsync_rocm.a to productstore area:
Copying tmp/el8_amd64_gcc13/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestPrefixScanROCmAsync/libalpakaTestPrefixScanROCmAsync_rocm.a to productstore area:


@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 14, 2025

please test with #10238

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 19, 2025

please test

@cmsbuild
Copy link
Contributor

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8e0c53/50363/summary.html
COMMIT: e3c6e9d
CMSSW: CMSSW_16_1_X_2025-12-18-2300/el8_amd64_gcc13
Additional Tests: GPU,AMD_MI300X,AMD_W7900,NVIDIA_H100,NVIDIA_L40S,NVIDIA_T4
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/10250/50363/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8e0c53/50363/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8e0c53/50363/git-merge-result

Comparison Summary

Summary:

  • You potentially added 16 lines to the logs
  • Reco comparison results: 4 differences found in the comparisons
  • Reco comparison had 4 failed jobs
  • DQMHistoTests: Total files compared: 53
  • DQMHistoTests: Total histograms compared: 4280393
  • DQMHistoTests: Total failures: 119
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4280254
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 52 files compared)
  • Checked 227 log files, 198 edm output root files, 53 DQM output files
  • TriggerResults: no differences found

AMD_MI300X Comparison Summary

Summary:

  • You potentially removed 5 lines from the logs
  • Reco comparison results: 254 differences found in the comparisons
  • Reco comparison had 6 failed jobs
  • DQMHistoTests: Total files compared: 11
  • DQMHistoTests: Total histograms compared: 149371
  • DQMHistoTests: Total failures: 31382
  • DQMHistoTests: Total nulls: 13
  • DQMHistoTests: Total successes: 117976
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 10 files compared)
  • Checked 42 log files, 45 edm output root files, 11 DQM output files
  • TriggerResults: no differences found

AMD_W7900 Comparison Summary

Summary:

  • You potentially removed 3 lines from the logs
  • Reco comparison results: 253 differences found in the comparisons
  • Reco comparison had 6 failed jobs
  • DQMHistoTests: Total files compared: 11
  • DQMHistoTests: Total histograms compared: 149371
  • DQMHistoTests: Total failures: 31881
  • DQMHistoTests: Total nulls: 11
  • DQMHistoTests: Total successes: 117479
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 10 files compared)
  • Checked 42 log files, 45 edm output root files, 11 DQM output files
  • TriggerResults: no differences found

NVIDIA_L40S Comparison Summary

Summary:

  • You potentially removed 2 lines from the logs
  • Reco comparison results: 254 differences found in the comparisons
  • Reco comparison had 6 failed jobs
  • DQMHistoTests: Total files compared: 11
  • DQMHistoTests: Total histograms compared: 149371
  • DQMHistoTests: Total failures: 28826
  • DQMHistoTests: Total nulls: 7
  • DQMHistoTests: Total successes: 120538
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 10 files compared)
  • Checked 42 log files, 45 edm output root files, 11 DQM output files
  • TriggerResults: no differences found

NVIDIA_T4 Comparison Summary

Summary:

  • You potentially removed 7 lines from the logs
  • Reco comparison results: 233 differences found in the comparisons
  • Reco comparison had 6 failed jobs
  • DQMHistoTests: Total files compared: 11
  • DQMHistoTests: Total histograms compared: 149371
  • DQMHistoTests: Total failures: 36933
  • DQMHistoTests: Total nulls: 9
  • DQMHistoTests: Total successes: 112429
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 10 files compared)
  • Checked 42 log files, 45 edm output root files, 11 DQM output files
  • TriggerResults: no differences found

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 21, 2025

The 16.0.x backport tests were successful.

The same error are observed on the H100 GPUs for the same workflows in an unrelated PR, so I suspect they are independent from this update.

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 21, 2025

Let's first try re-running the tests.

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 21, 2025

test parameters:

  • enable = gpu
  • gpu = nvidia_h100

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 21, 2025

please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: RelVals-NVIDIA_H100
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8e0c53/50381/summary.html
COMMIT: e3c6e9d
CMSSW: CMSSW_16_1_X_2025-12-21-0000/el8_amd64_gcc13
Additional Tests: GPU,NVIDIA_H100
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/10250/50381/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed RelVals-NVIDIA_H100

  • 29834.40429834.404_TTbar_14TeV+Run4D110PU_Patatrack_PixelOnlyAlpaka_Profiling/step2_TTbar_14TeV+Run4D110PU_Patatrack_PixelOnlyAlpaka_Profiling.log
  • 29834.40329834.403_TTbar_14TeV+Run4D110PU_Patatrack_PixelOnlyAlpaka_Validation/step2_TTbar_14TeV+Run4D110PU_Patatrack_PixelOnlyAlpaka_Validation.log
  • 29834.40229834.402_TTbar_14TeV+Run4D110PU_Patatrack_PixelOnlyAlpaka/step2_TTbar_14TeV+Run4D110PU_Patatrack_PixelOnlyAlpaka.log
Expand to see more relval errors ...

Comparison Summary

Summary:

  • You potentially removed 6 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 14 differences found in the comparisons
  • Reco comparison had 4 failed jobs
  • DQMHistoTests: Total files compared: 53
  • DQMHistoTests: Total histograms compared: 4280553
  • DQMHistoTests: Total failures: 70
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4280463
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 52 files compared)
  • Checked 227 log files, 198 edm output root files, 53 DQM output files
  • TriggerResults: no differences found

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 21, 2025

OK, the H100 runner is clear broken :-/

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 21, 2025

ignore tests-rejected with ib-failure

@smuzaffar
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 5, 2026

-1

Failed Tests: RelVals-NVIDIA_H100
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8e0c53/50430/summary.html
COMMIT: e3c6e9d
CMSSW: CMSSW_16_1_X_2026-01-04-2300/el8_amd64_gcc13
Additional Tests: GPU,NVIDIA_H100
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/10250/50430/install.sh to create a dev area with all the needed externals and cmssw changes.

Failed RelVals-NVIDIA_H100

  • 34634.40334634.403_TTbar_14TeV+Run4D121PU_Patatrack_PixelOnlyAlpaka_Validation/step2_TTbar_14TeV+Run4D121PU_Patatrack_PixelOnlyAlpaka_Validation.log
  • 34634.40234634.402_TTbar_14TeV+Run4D121PU_Patatrack_PixelOnlyAlpaka/step2_TTbar_14TeV+Run4D121PU_Patatrack_PixelOnlyAlpaka.log
  • 34634.75134634.751_TTbar_14TeV+Run4D121PU_HLT75e33TimingAlpaka/step2_TTbar_14TeV+Run4D121PU_HLT75e33TimingAlpaka.log
Expand to see more relval errors ...

Comparison Summary

Summary:

  • You potentially added 6 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 6 differences found in the comparisons
  • Reco comparison had 4 failed jobs
  • DQMHistoTests: Total files compared: 53
  • DQMHistoTests: Total histograms compared: 4280553
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4280530
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 52 files compared)
  • Checked 227 log files, 198 edm output root files, 53 DQM output files
  • TriggerResults: no differences found

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants