Hi, thank you for the great work and for releasing the pretrained weights of Diff-II.
I am currently conducting research on fine-grained text-to-image tuning and would like to reproduce your results more faithfully. In particular, I am very interested in the soft token / soft prompt learning pipeline used during training.
Would it be possible to release the corresponding training code or a minimal example of the soft token optimization process? Having access to the training pipeline would greatly help the community understand and extend Diff-II.