GitHub - csslc/Self-Transcendence: Official code repository for "Beyond External Guidance: Unleashing the Semantic Richness Inside Diffusion Transformers for Improved Training"

Beyond External Guidance:
Unleashing the Semantic Richness Inside Diffusion Transformers for Improved Training

¹The Hong Kong Polytechnic University, ²OPPO Research Institute

🧡ྀི Summary

Both shallower and deeper layers gradually learn more discriminative patterns over time, but the shallower layer progresses very slowly. This indicates that the slow convergence of DiT is mainly due to the difficulty in learning clean and semantically rich features in shallow layers.

We answer this question: Can internal features be used as effective semantic guidance signals to guide the shallow layers' training? and introduce Self-Transcendence, a simple yet effective self-guided training strategy achieving REPA-level performance without any external feature supervision. Our proposed approach produces more discriminative and semantically richer features than pre-trained DINO used in REPA. Our method significantly improves training efficiency and generation quality, acheiving FID=1.25 at just 400 epochs.

⏰ Update

2026.1.12: The paper and this repo are released.

⭐ If Self-Transcendence is helpful to your images or projects, please help star this repo. Thanks! 🤗

🌟 Overview framework

We find that the most effective guiding features should meet two criteria:

(1) they should have a clean structure, in the sense that they can effectively help shallow blocks distinguish noise from signal.

(2) they should be semantically discriminative, making it easier for shallow layers to learn effective representations.

With these considerations, we propose a two-stage training framework.

(a) Firstly, we use clean VAE features as guidance to help the model distinguish useful information from noise in shallow layers.

(b) After a certain number of iterations, the model has learned more meaningful representations. We then freeze this model and use its representation as a fixed teacher. To enhance the semantic expression in the features, we build a self-guided representation that better aligns with the target conditions.

VAE-based alignment accelerates SiT training, while leveraging this model for self-transcendence leads to further improvements.

Citations

If our code helps your research or work, please consider citing our paper. The following are BibTeX references:

@article{sun2026beyond,
  title={Beyond External Guidance: Unleashing the Semantic Richness Inside Diffusion Transformers for Improved Training},
  author={Sun, Lingchen and Wu, Rongyuan and Zhang, Zhengqiang and Li, Ruibin and Sun, Yujing and Liu, Shuaizheng and Zhang, Lei},
  journal={arXiv preprint arXiv: 2601.07773},
  year={2026}
}

License

This project is released under the Apache 2.0 license.

Acknowledgement

This project is based on REPA. Thanks for the awesome work.

Contact

If you have any questions, please contact: ling-chen.sun@connect.polyu.hk

statistics

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
figs		figs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Beyond External Guidance:
Unleashing the Semantic Richness Inside Diffusion Transformers for Improved Training

🧡ྀི Summary

⏰ Update

🌟 Overview framework

Citations

License

Acknowledgement

Contact

About

Uh oh!

Releases

Packages

csslc/Self-Transcendence

Folders and files

Latest commit

History

Repository files navigation

Beyond External Guidance: Unleashing the Semantic Richness Inside Diffusion Transformers for Improved Training

🧡ྀི Summary

⏰ Update

🌟 Overview framework

Citations

License

Acknowledgement

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Beyond External Guidance:
Unleashing the Semantic Richness Inside Diffusion Transformers for Improved Training

Packages