It would be better to report both total flops and non embedding flops. Leave out both the `lm_head` and the initial `nn.Embedding`