Given a nearest-pixel inpainted image (left) and its corresponding prompt, we can inpaint the image via noise optimization.
| Input | Inpainted Output |
![]() |
![]() |
"A young man with short black hair styled upward, dark brown eyes, and fair skin with light stubble. He has well-defined eyebrows and is wearing a black collar or shirt. The background is a clean white."
- Clone the repository
git clone git@github.com:ubc-vision/sonic.git
cd sonic- Install PyTorch
Install PyTorch with CUDA support. Visit https://pytorch.org for installation instructions. The codebase was tested with PyTorch 2.7.1+cu128, and 2.9.1+cu128.
- Install required packages
For inpainting only:
pip install diffusers==0.31.0 transformers==4.46.3 accelerate==1.9.0 pillow==10.4.0 numpy==1.26.4 protobuf==6.31.1 sentencepiece==0.2.0For metrics evaluation (optional):
pip install torchmetrics torchvision tqdm pandas open_clip_torch hpsv2 image-rewardRun the inpainting script with predefined datasets:
python sonic_inpaint.py \
--dataset_name FFHQ \
--image_index 00064 \
--num_iterations 20 \
--step_nums 20 \
--CFG_scale 2.0 \
--learning_rate 3.0Run the inpainting script with custom images and prompts:
python sonic_inpaint.py \
--image_path /path/to/image.png \
--mask_path /path/to/mask.png \
--prompt "Your text prompt here" \
--num_iterations 20 \
--step_nums 20 \
--CFG_scale 2.0 \
--learning_rate 3.0Results are saved to inpaint_results/{dataset_name}_{image_name}_steps{step_nums}_iter{num_iterations}/:
target_image.png- Masked target imageepsilon/- Optimized noise at each iterationx_0_hat/- Predicted clean images during optimizationinpainted_output/inpainted.png- Final inpainted result
- ✅ Inpainting code with sample images and prompts
- ✅ Environment setup guide
- ✅ Metrics code
- ✅ (Experimental) Wan 2.1 code update
- ⬜ (To be updated!) Video inpainting examples with Wan2.1
@article{baek2025sonicspectraloptimizationnoise,
title={SONIC: Spectral Optimization of Noise for Inpainting with Consistency},
author={Seungyeon Baek and Erqun Dong and Shadan Namazifard and Mark J. Matthews and Kwang Moo Yi},
year={2025},
eprint={2511.19985},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.19985},
}

