-
Notifications
You must be signed in to change notification settings - Fork 57
Open
Description
In the slides, you mentioned that all-reduces are decomposed into reduce-scatter and all-gather. So basically, it costs double of those ops. However, in XLA on TPU, the reduce-scatter is often implemented with all-reduce and dynamic-slice which suggests the opposite way where all-reduce is much faster than reduce-scatter. Can you explain the differences?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels