Program to explore CUDA and GPU related performance characteristics when multiplying matrices. Following results are for multiplying 3073x3073 matrices on a i7-8750H CPU and a GTX1060 GPU.
| Method | Running time*, ms | Kernel time, ms |
|---|---|---|
| CPU cell wise | 121000 | - |
| CPU layer wise | 10900 | - |
| GPU cell wise | 326 | 311 |
| GPU block wise | 134 | 119 |
| GPU layer wise | 37.1 | 23.7 |
| cuBLAS gemm | 36.8 | 22.0 |
*For GPUs running time includes data download from the GPU