I noticed in the supplementary material that the number of steps is 50,000, but in main.py, steps_per_epoch=500. I would like to ask if this is a mistake? Additionally, the batch_size and gradient accumulation are also different from what was used in the paper.