Hi! Thanks for sharing the great job! Maybe I missed something, would it be possible to incorporate a text condition into the diffusion model in a simple way? I understand it would be a cross attention for both the text condition and noisy image, but any suggestion on where to inject the textual information in a more effective and easy way? Thanks for your time!