Each method is doing it's own way of logging the metrics both for train and eval. However, in the process of unifying everything it'd be nice to have one unified log and evaluate functions for all classes.
We could potentially differentiate between two superscripts, one for dpo based models and one for ppo based ones and just import them for convenience in other functions.