About

Welcome! This repository is a collection of my experiments and examples as I learn about GPU programming and parallel computing with CUDA. Here you'll find code exploring everything from simple vector addition to more advanced parallel algorithms.

I am referring this book:

Programming Massively Parallel Processors: A Hands-on Approach by Wen-mei W. Hwu, David B. Kirk, and Izzat El Hajj

to get a grasp of CUDA fundamentals and more advanced topics.

Requirements

NVIDIA GPU with CUDA support
CUDA Toolkit
NVCC (NVIDIA CUDA Compiler)
C++ compiler

Programs

Program	Description	Page Link
Vector Addition	Basic CUDA program demonstrating parallel vector addition. Each thread computes one element of the result vector.	README
Matrix Multiplication	Matrix multiplication implementation with two versions: naive kernel and tiled kernel using shared memory. Demonstrates key CUDA concepts like shared memory, tiling, and memory coalescing.	README
One-Head Attention	Implementation of scaled dot-product attention mechanism using CUDA. Computes Attention(Q, K, V) = softmax(QK^T / √d) × V using multiple optimized CUDA kernels.	README

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
matrix_mult		matrix_mult
one_head_attention		one_head_attention
vector_add		vector_add
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Requirements

Programs

About

Uh oh!

Releases

Packages

Languages

greninja/cuda_programs

Folders and files

Latest commit

History

Repository files navigation

About

Requirements

Programs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages