I'm using KAT instead of normal attention. The validation losses came out with a difference of 0.000338227 between the runs.