I've forked this repository and added a simple training script that is accessible via Google Colab. Here are some important findings and tips for using the script:
- Recommended GPU: Use at least an L4 GPU for training. The training process takes up 10.5 GB of VRAM.
- Training Duration:
- Training a single model takes about 16 minutes with the default settings.
- While the L4 GPU has 22 GB of VRAM, allowing for the training of two models simultaneously, this approach slows down the process significantly, making it take around 35 minutes (17.5 minutes per model). Therefore, it is more efficient to train one model at a time.
- Image Recommendations:
- For style training, it is better to use 5 to 8 images.
- For content training, using just 1 image is sufficient if only using a reference image for style transfer.
- However, training a LoRA on a single image for content does not perform well when using prompts for style transfer. For example, using a prompt like "A [v45] in style of gold" may not yield good results as seen in the authors' mentioned notebook.
Feel free to experiment and good luck! Please share your findings too, I yet have to extensively test it too!
This repository contains the official implementation of the B-LoRA method, which enables implicit style-content separation of a single input image for various image stylization tasks. B-LoRA leverages the power of Stable Diffusion XL (SDXL) and Low-Rank Adaptation (LoRA) to disentangle the style and content components of an image, facilitating applications such as image style transfer, text-based image stylization, and consistent style generation.
There were some issues with the new versions of diffusers and PEFT that caused the fine-tuning process to not converge as quickly as desired. In the meantime, we have uploaded the original training script that we used in the paper.
Please note that we used a previous version of diffusers (0.25.0) and did not use PEFT.
- Python 3.11.6+
- PyTorch 2.1.1+
- Other dependencies (specified in
requirements.txt)
-
Clone this repository:
git clone https://github.com/yardenfren1996/B-LoRA.git cd B-LoRA -
Install the required dependencies:
pip install -r requirements.txt(for windows 10 here)
-
Training B-LoRAs
To train the B-LoRAs for a given input image, run:
accelerate launch train_dreambooth_b-lora_sdxl.py \ --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" \ --instance_data_dir="<path/to/example_images>" \ --output_dir="<path/to/output_dir>" \ --instance_prompt="<prompt>" \ --resolution=1024 \ --rank=64 \ --train_batch_size=1 \ --learning_rate=5e-5 \ --lr_scheduler="constant" \ --lr_warmup_steps=0 \ --max_train_steps=1000 \ --checkpointing_steps=500 \ --seed="0" \ --gradient_checkpointing \ --use_8bit_adam \ --mixed_precision="fp16"
This will optimize the B-LoRA weights for the content and style and store them in output_dir.
Parameters that need to replace instance_data_dir, output_dir, instance_prompt (in our paper we use A [v])
-
Inference
For image stylization based on a reference style image (1), run:
python inference.py --prompt="A <c> in <s> style" --content_B_LoRA="<path/to/content_B-LoRA>" --style_B_LoRA="<path/to/style_B-LoRA>" --output_path="<path/to/output_dir>"This will generate new images with the content of the first B-LoRA and the style of the second B-LoRA. Note that you need to replace
candsin the prompt according to the optimization prompt.For text-based image stylization (2), run:
python inference.py --prompt="A <c> made of gold"" --content_B_LoRA="<path/to/content_B-LoRA>" --output_path="<path/to/output_dir>"This will generate new images with the content of the given B-LoRA and the style specified by the text prompt.
For consistent style generation (3), run:
python inference.py --prompt="A backpack in <s> style" --style_B_LoRA="<path/to/style_B-LoRA>" --output_path="<path/to/output_dir>"This will generate new images with the specified content and the style of the given B-LoRA.
Several additional parameters that you can set in the
inference.pyfile include:--content_alpha,--style_alphafor controlling the strength of the adapters.--num_images_per_promptfor specifying the number of output images.
(For a111 and comfy see this issue)
If you use B-LoRA in your research, please cite the following paper:
@misc{frenkel2024implicit,
title={Implicit Style-Content Separation using B-LoRA},
author={Yarden Frenkel and Yael Vinker and Ariel Shamir and Daniel Cohen-Or},
year={2024},
eprint={2403.14572},
archivePrefix={arXiv},
primaryClass={cs.CV}
}This project is licensed under the MIT License.
If you have any questions or suggestions, please feel free to open an issue or contact the authors at yardenfren@gmail.com.

