Zhi-Ming Ma2 · Guolin Ke3 · Tailin Wu1 · Stan Z. Li1
1 Westlake University 2 Chinese Academy of Sciences 3 DP Technology
Summary: We propose Explicit ShortCut (ESC), a framework provides theoretical justification for validity of shortcut models and disentangles concrete component-level choices, thereby enabling systematic identification of improvements. With our proposed improvements, the resulting one-step model achieves a new state-of-the-art FID50k of 2.85 on ImageNet-256×256, and further reaches FID50k of 2.52 with 2× training steps, under the classifier-free guidance setting without pre-training, distillation, or curriculum learning.
This implementation utilizes LMDB datasets with VAE-encoded latent representations for efficient training. The preprocessing pipeline is reimplementation from the MAR. Once the ImageNet is downloaded in "YOUR/IMAGNET/PATH", run the following for create the LMDB datasets:
torchrun preprocess_scripts/main_cache_imagenet.py \
--folder_dir "YOUR/IMAGNET/PATH/train"
--target_lmdb "YOUR/DESTINATION/LMDB/PATH"See ./scripts for detailed training commands.
We provide pretrained checkpoints for models trained with class-consistent minibatching:
| Models | Iterations (Epochs) | Checkpoint Links | FID-50k |
|---|---|---|---|
| ESC-XL/2 | 1.2M (240) | Hugging Face/ESC-XL2 | 2.85 |
| ESC-XL/2 | 2.4M (480) | Hugging Face/ESC-XL2 | 2.53 |
| ESC-B/2 | 600k (240) | Hugging Face/ESC-B2 | 5.78 |
For the trained checkpoints, or the downloaded ones (.pt file), we provide a distributed evaluation scripts for large-scale sampling and quantitative evaluation (FID, IS):
torchrun --nproc_per_node=8 --nnodes=1 evaluate.py \
--ckpt "/PATH/TO/THE/CHECKPOINTS" \
--model "SiT-B/2" \
--resolution 256 \
--cfg-scale 1.0 \
--per-proc-batch-size 128 \
--num-fid-samples 50000 \
--sample-dir "./fid_dir" \
--compute-metrics \
--num-steps 1 \
--fid-statistics-file "./fid_stats/adm_in256_stats.npz" \
--adapt-modelIf there is any data type problem, it means that the numpy or torch version is not correct, you can run the following instead:
torchrun --nnodes=1 evaluate.py \
--ckpt "/PATH/TO/THE/CHECKPOINTS" \
--model "SiT-B/2" \
--resolution 256 \
--cfg-scale 1.0 \
--per-proc-batch-size 128 \
--num-fid-samples 50000 \
--sample-dir "./fid_dir" \
--compute-metrics \
--num-steps 1 \
--fid-statistics-file "./fid_stats/adm_float32_in256_stats.npz" \
--adapt-modelThis codebase is built upon REPA. We thank the authors for their excellent work and open-source contribution.
We also thank the original MeanFlow implementation: Gsunshine/MeanFlow, Gsunshine/py-meanflow, and zhuyu-cs/MeanFlow for their PyTorch reimplementation, which helped with early code restructuring.
For IMM, sCT, and CM, we thanks their (re-)implementation for our further remodularizing.
If you find our work is helpful to your research, please cite the following:
@misc{lin2025designonestepdiffusionshortcutting,
title={On the Design of One-step Diffusion via Shortcutting Flow Paths},
author={Haitao Lin and Peiyan Hu and Minsi Ren and Zhifeng Gao and Zhi-Ming Ma and Guolin ke and Tailin Wu and Stan Z. Li},
year={2025},
eprint={2512.11831},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2512.11831},
}
