Programming models for GPUs

Collection of information about programming models. Particularly about implementation and lower level information. Knowing lower levels of abstraction is invaluable for debugging and reasoning about performance.

OpenCL

Khronos ICD loader: https://github.com/KhronosGroup/OpenCL-ICD-Loader

Can be useful to build yourself to debug problems (for example, an OpenCL implementation library not showing up in the list of devices because it is silently failing at the dynamic link stage)
OpenCL device simulator, Oclgrind: https://github.com/jrprice/Oclgrind
Open source implementation, POCL: http://portablecl.org/
SPIR-V is used for an intermediate device-independent distribution format

OpenMP target offload

Observations

Observations and comments on programming OpenMP in the OpenMP_observations page.

Kokkos

Core library: https://github.com/kokkos/kokkos

SYCL

CUDA

Chapel

Chapel is a programming language designed for productive parallel programming. It now suports GPU programming.

AI/ML

Machine learning has given rise to a number of models for writing kernels

Triton language and compiler and https://github.com/triton-lang/triton
- Not to be confused with the Triton Inference Server from NVidia
Apache TVM. TensorIR is at the closest semantic level to Triton. Nice tutorial from Machine Learning Compliation

Example codes

My own repository for samples is here: https://github.com/markdewing/qmc_kernels

The vector_add kernel has the most implementations, being the simplest example that actually does something: https://github.com/markdewing/qmc_kernels/tree/master/kernels/vector_add

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Programming models for GPUs

OpenCL

OpenMP target offload

LLVM/Clang

GCC

Examples

Observations

Kokkos

SYCL

CUDA

Chapel

AI/ML

Example codes

FilesExpand file tree

software_models.md

Latest commit

History

software_models.md

File metadata and controls

Programming models for GPUs

OpenCL

OpenMP target offload

LLVM/Clang

GCC

Examples

Observations

Kokkos

SYCL

CUDA

Chapel

AI/ML

Example codes