Fixes for atomic coalescing at L1, correlated with QV100 hardware#33
Fixes for atomic coalescing at L1, correlated with QV100 hardware#33abhaumick wants to merge 8 commits intoaccel-sim:devfrom
Conversation
Sub core & some minor bug fix
- best case coalescing of atomic operations - full CAM based search - integrated with DPRINTF with ATOMICS Flag
- replaced full CAM coalescing with common case coalescing - correlated with QV100 GPU
- added ATOMICS_DETAIL trace flag - made ATOMICS prints concise - disabled tracing and restored default trace flags in QV100 tested-cfgs
There was a problem hiding this comment.
Correlation of atomic ubenches does not look significantly better compared to the latest (as of this review) dev branch of GPGPU-sim, atomic_add_bw_diverge still off by a lot. Code makes sense though.
Still waiting for the feedback of the other reviewers, and the original author.
tgrogers
left a comment
There was a problem hiding this comment.
This has been lingering a long time.
@abhaumick , @cesar-avalos3 can you update me on the current state here?
Several years ago, when we checking the accuracy of the atomics we had a set of uBenches that tested latency and BW when warps were updating both the same address and different addresses. From what I remember the results were a disaster.
@abhaumick spent a non-trivial amount of time trying to fix this. And I think this PR is the results of that time.
Do we have uBenches in the regressions that test the atomics? I certainly remember a long email chain about our volta atomic correlations.
|
I think this code was a good fix. |
fixed atomic coalescing at L1
warp_inst_t::memory_coalescing_arch_atomic()modified trace.h
DPRINTF_RAW()to allow prints without gpu_sim_cyclegpu_sim_cycleorgpu_tot_sim_cyclevariables used by DPRINTFadded config option
gpgpu_shmem_atomic_warp_partsadded trace streams
resolves mismatch reported in Meeting Minutes -- 3/20/20