Hi,
Thank you for your excellent implementation of D3PM.
When I run the d3pm_runner.py, the vb_loss is only 0.0002, which is significantly smaller than the ce_loss. Is this an inherent characteristic of D3PM, or does it occur only with specific transition matrices, such as uniform or absorbing?