Skip to content

Project 2: Zhanbo Lin#49

Open
skszb wants to merge 4 commits intoCIS5650-Fall-2025:mainfrom
skszb:main
Open

Project 2: Zhanbo Lin#49
skszb wants to merge 4 commits intoCIS5650-Fall-2025:mainfrom
skszb:main

Conversation

@skszb
Copy link

@skszb skszb commented Sep 18, 2025

Repo Link

I'm using a late day for this project.

Implemented features:

  • CPU sequential scan and stream compact algorithm
  • Naive parallel scan
  • Work-efficient parallel scan & GPU stream compaction
  • Thrust's implementation wrapper

Additional Features:

  • (Extra Credit in Part 5) Work-efficient scan outperforms both CPU and naïve GPU implementations
  • (Extra Credit 2) Enhanced work-efficient scan with shared memory and support for arrays of arbitrary sizes by splitting work into blocks; uses iterative kernel launches to simulate recursion, allowing pre-allocation of buffers and more accurate timing of kernel execution
  • Explored variadic function templates and implemented a helper function (testForIterations() in testing_helper.hpp) to repeatedly execute a target function and compute the average runtime, reducing manual effort in performance testing

Some thoughts:

  • Really enjoyed this project, but I wish we could have a crash course about interpreting Nvidia Nsight Compute profiling reports before it.

- cpu scan
- naive scan
- helper functions for profiling N iterations
Implemented:
- naive scan
- efficient scan
- compact using efficient scan
- wrapper for thrust::scan
- work-efficient scan using shared memory

- some adjustments in testing code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant