Skip to content

Implicit Style-Content Separation using B-LoRA

License

Notifications You must be signed in to change notification settings

itsitgroup/B-LoRA-Colab-Notebook

 
 

Repository files navigation

My Additions to This Repository

Simple Training Script and Tips

I've forked this repository and added a simple training script that is accessible via Google Colab. Here are some important findings and tips for using the script:

  • Recommended GPU: Use at least an L4 GPU for training. The training process takes up 10.5 GB of VRAM.
  • Training Duration:
    • Training a single model takes about 16 minutes with the default settings.
    • While the L4 GPU has 22 GB of VRAM, allowing for the training of two models simultaneously, this approach slows down the process significantly, making it take around 35 minutes (17.5 minutes per model). Therefore, it is more efficient to train one model at a time.
  • Image Recommendations:
    • For style training, it is better to use 5 to 8 images.
    • For content training, using just 1 image is sufficient if only using a reference image for style transfer.
    • However, training a LoRA on a single image for content does not perform well when using prompts for style transfer. For example, using a prompt like "A [v45] in style of gold" may not yield good results as seen in the authors' mentioned notebook.

Feel free to experiment and good luck! Please share your findings too, I yet have to extensively test it too!

Open In Colab


Implicit Style-Content Separation using B-LoRA

arXiv Open In Colab HuggingFace demo

Teaser Image

This repository contains the official implementation of the B-LoRA method, which enables implicit style-content separation of a single input image for various image stylization tasks. B-LoRA leverages the power of Stable Diffusion XL (SDXL) and Low-Rank Adaptation (LoRA) to disentangle the style and content components of an image, facilitating applications such as image style transfer, text-based image stylization, and consistent style generation.

🔧 21.5.2024: Important Update 🔧

There were some issues with the new versions of diffusers and PEFT that caused the fine-tuning process to not converge as quickly as desired. In the meantime, we have uploaded the original training script that we used in the paper.

Please note that we used a previous version of diffusers (0.25.0) and did not use PEFT.

Getting Started

Prerequisites

  • Python 3.11.6+
  • PyTorch 2.1.1+
  • Other dependencies (specified in requirements.txt)

Installation

  1. Clone this repository:

    git clone https://github.com/yardenfren1996/B-LoRA.git
    cd B-LoRA
    
  2. Install the required dependencies:

    pip install -r requirements.txt
    

    (for windows 10 here)

Usage

  1. Training B-LoRAs

    To train the B-LoRAs for a given input image, run:

    accelerate launch train_dreambooth_b-lora_sdxl.py \
     --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" \
     --instance_data_dir="<path/to/example_images>" \
     --output_dir="<path/to/output_dir>" \
     --instance_prompt="<prompt>" \
     --resolution=1024 \
     --rank=64 \
     --train_batch_size=1 \
     --learning_rate=5e-5 \
     --lr_scheduler="constant" \
     --lr_warmup_steps=0 \
     --max_train_steps=1000 \
     --checkpointing_steps=500 \
     --seed="0" \
     --gradient_checkpointing \
     --use_8bit_adam \
     --mixed_precision="fp16"
    

This will optimize the B-LoRA weights for the content and style and store them in output_dir. Parameters that need to replace instance_data_dir, output_dir, instance_prompt (in our paper we use A [v])

Apps Image

  1. Inference

    For image stylization based on a reference style image (1), run:

    python inference.py --prompt="A <c> in <s> style" --content_B_LoRA="<path/to/content_B-LoRA>" --style_B_LoRA="<path/to/style_B-LoRA>" --output_path="<path/to/output_dir>"
    

    This will generate new images with the content of the first B-LoRA and the style of the second B-LoRA. Note that you need to replace c and s in the prompt according to the optimization prompt.

    For text-based image stylization (2), run:

    python inference.py --prompt="A <c> made of gold"" --content_B_LoRA="<path/to/content_B-LoRA>" --output_path="<path/to/output_dir>"
    

    This will generate new images with the content of the given B-LoRA and the style specified by the text prompt.

    For consistent style generation (3), run:

    python inference.py --prompt="A backpack in <s> style" --style_B_LoRA="<path/to/style_B-LoRA>" --output_path="<path/to/output_dir>"
    

    This will generate new images with the specified content and the style of the given B-LoRA.

    Several additional parameters that you can set in the inference.py file include:

    1. --content_alpha, --style_alpha for controlling the strength of the adapters.
    2. --num_images_per_prompt for specifying the number of output images.

    (For a111 and comfy see this issue)

Citation

If you use B-LoRA in your research, please cite the following paper:

@misc{frenkel2024implicit,
      title={Implicit Style-Content Separation using B-LoRA}, 
      author={Yarden Frenkel and Yael Vinker and Ariel Shamir and Daniel Cohen-Or},
      year={2024},
      eprint={2403.14572},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

This project is licensed under the MIT License.

Contact

If you have any questions or suggestions, please feel free to open an issue or contact the authors at yardenfren@gmail.com.

About

Implicit Style-Content Separation using B-LoRA

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 97.0%
  • Python 3.0%