Welcome! This repository is a collection of my experiments and examples as I learn about GPU programming and parallel computing with CUDA. Here you'll find code exploring everything from simple vector addition to more advanced parallel algorithms.
I am referring this book:
Programming Massively Parallel Processors: A Hands-on Approach by Wen-mei W. Hwu, David B. Kirk, and Izzat El Hajj
to get a grasp of CUDA fundamentals and more advanced topics.
- NVIDIA GPU with CUDA support
- CUDA Toolkit
- NVCC (NVIDIA CUDA Compiler)
- C++ compiler
| Program | Description | Page Link |
|---|---|---|
| Vector Addition | Basic CUDA program demonstrating parallel vector addition. Each thread computes one element of the result vector. | README |
| Matrix Multiplication | Matrix multiplication implementation with two versions: naive kernel and tiled kernel using shared memory. Demonstrates key CUDA concepts like shared memory, tiling, and memory coalescing. | README |
| One-Head Attention | Implementation of scaled dot-product attention mechanism using CUDA. Computes Attention(Q, K, V) = softmax(QK^T / √d) × V using multiple optimized CUDA kernels. | README |