First of all, thank you for the implementation. It's very helpful. I have one question. After sync batch norm is applied, it consumes more GPU memory than normal batch norm. Is it right?