In addition to TokenMix before the first Transformer block, have you considered or tried TokenMix in the middle of the model?