Hello again!
The DiLoCo paper by Arthur Douillard et al. explores the non-i.i.d. data regime in comparison to data parallelism. Could you kindly confirm if OpenDiLoCo supports this setup? If it does, could you please provide guidance on how to configure such an experiment? If not, would you recommend an efficient way to organize it in a similar manner to the approach described in the paper?
Thank you!