Conversation
Where available, `__hip_atomic_fetch_sub` can be used to implement the `atomicSub` family. Introduced in llvm e3fbede7f3f
|
@yxsamliu should we use the fetch_sub? |
|
Thanks for checking on this, Jatin. I've just found out that this isn't practically feasible at the moment. Atomic sub operations on shared USM address ranges can be improperly handled by the PCIe bus without explicit prefetches, and given that the Rocm/HSA drivers have no way to catch this, it can result in these sub operations not happening (see intel/llvm#7252 and this rocm ticket for details). However, I think that behaviour is a bug that needs to be addressed, and that correctness must be present at the device and driver level before anything on top can work. I'll leave this patch open as a reminder, but understand that merging it will break HIP atomics a little more until the lower levels of the stack work reliably |
|
probably we should not use __hip_atomic_fetch_sub for atomicSub_system since it does not work across PCIE. We may still use it for atomicSub. |
Where available,
__hip_atomic_fetch_subcan be used to implement theatomicSubfamily.Introduced in llvm e3fbede7f3f