Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ build
.LSOverride

# Icon must end with two \r
Icon
Icon


# Thumbnails
._*
Expand Down Expand Up @@ -560,3 +561,4 @@ xcuserdata
*.xccheckout
*.moved-aside
*.xcuserstate
.vscode
44 changes: 38 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,44 @@ CUDA Stream Compaction

**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 2**

* (TODO) YOUR NAME HERE
* (TODO) [LinkedIn](), [personal website](), [twitter](), etc.
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
* Nicholas Liu
* [Linkedin](https://www.linkedin.com/in/liunicholas6/)
* Tested on: Linux Mint 22 Wilma, AMD Ryzen 7 5800X @ 2.512GHz, 32GB RAM, GeForce GTX 1660 Ti

### (TODO: Your README)
# Project Description

Include analysis, etc. (Remember, this is public, so don't put
anything here that you don't want to share with the world.)
This project implements scan (prefix-sum) in a variety of different ways. We test a CPU implementation, a GPU naive version, and a GPU work-efficient version against the implementation from the Thrust library.

# Performance Analysis

At very small array sizes, the CPU scan performs the best. As we increase the size of the array, the CPU begins to perform relatively worse, and is quickly eclipsed in speed by the Thrust implementation.

Error bars shown are 1 sigma with 1000 runs of scan.

![chart: cpu better](img/cpu-better.png)

Once we reach a number of elements on the order of 2^23 or so, the work-efficient scan implemented also performs better than the CPU scan

![chart: gpu better](img/gpu-better.png)

| 256 | CPU | 7.585e-05 | 1.684521478821987e-05 | |
|---------|-----------|---------------------|------------------------|---|
| 256 | Naive | 0.019890368 | 0.018129259603148012 | |
| 256 | Efficient | 0.041029536 | 0.03196243203484814 | |
| 256 | Thrust | 0.022883072 | 0.07034156396333269 | |
| 4096 | CPU | 0.00092551 | 0.00017840902353771144 | |
| 4096 | Naive | 0.02674304 | 0.005283869080783783 | |
| 4096 | Efficient | 0.053911488 | 0.0037359112841419267 | |
| 4096 | Thrust | 0.0177352 | 0.009973741521411908 | |
| 65536 | CPU | 0.017869165 | 0.003946497566300506 | |
| 65536 | Naive | 0.099199506 | 0.18741031849147752 | |
| 65536 | Efficient | 0.15358839999999999 | 0.34961337456361957 | |
| 65536 | Thrust | 0.089322486 | 0.3254493673567683 | |
| 1048576 | CPU | 0.27509278 | 0.048449150282649026 | |
| 1048576 | Naive | 0.814048116 | 0.04939857649465461 | |
| 1048576 | Efficient | 0.520357476 | 0.11646960803214108 | |
| 1048576 | Thrust | 0.199752738 | 0.07898137835466129 | |
| 8388608 | CPU | 4.41064898 | 1.176601505235431 | |
| 8388608 | Naive | 7.03792806 | 0.37055519290846245 | |
| 8388608 | Efficient | 3.77427417 | 0.2990962226148862 | |
| 8388608 | Thrust | 0.519275972 | 0.20947768302315156 | |
4 changes: 4 additions & 0 deletions analysis/data/12.csv

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions analysis/data/16.csv

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions analysis/data/20.csv

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions analysis/data/23.csv

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions analysis/data/8.csv

Large diffs are not rendered by default.

58 changes: 58 additions & 0 deletions analysis/project/graph.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
import matplotlib.pyplot as plt
import csv
import os
import statistics
from collections import defaultdict
import numpy as np

data_path = "../data"

scan_types = ["CPU", "Naive", "Efficient", "Thrust"]
array_sizes = []
ys = {}
errs = {}

for scan_type in scan_types:
ys[scan_type] = []
errs[scan_type] = []

filenames = [filename for filename in os.listdir(data_path)]
filenames.sort(key=lambda name: int(name.split('.')[0]))
# filenames = filenames[:-1]
# filenames = [filenames[-1]]

for filename in filenames:
array_sizes.append(1 << int(filename.split('.')[0]))
with open(data_path + "/" + filename) as file:
reader = csv.reader(file)
for (scan_type, line) in zip(scan_types, reader):
contents = [float(val) for val in line if val != '']
mean = statistics.mean(contents)
stdev = statistics.stdev(contents)
ys[scan_type].append(mean)
errs[scan_type].append(stdev)

print(array_sizes[-1], scan_type, mean, stdev)

# xs = np.arange(len(array_sizes))
# fig, ax = plt.subplots()
# bar_width = 0.1

# for (index, scan_type) in enumerate(scan_types):
# ax.bar(
# xs + (index - len(scan_types) / 2) * bar_width, ys[scan_type], bar_width, label = scan_type, yerr=errs[scan_type]
# )

# ax.legend()

# # ax.set_ylim(0, 1)
# ax.set_ylabel("Milliseconds")

# # ax.set_xlabel("Array size")
# ax.set_xticks([])
# # ax.set_xticks(xs)
# # ax.set_xticklabels(array_sizes)

# ax.set_title("Time to run scan on array of 2^23")

# plt.savefig('chart.png')
Loading