[Paper] [Video] [Hugging Face] [Demo]
In this work, we investigate automatic design composition from multimodal graphic elements. We propose LaDeCo, which introduces the layered design principle to accomplish this challenging task through two steps: layer planning and layered design composition.
- Clone this repository
git clone https://github.com/microsoft/elem2design.git
cd elem2design- Install
conda create -n e2d python=3.10 -y
conda activate e2d
pip install --upgrade pip
pip install -e .
pip install -e thirdparty/opencole
pip install -e dataset/src- Install additional packages for training cases
pip install -e ".[train]"
pip install flash-attn --no-build-isolationPlease check this folder for the layer planning code, or you can directly use the predicted labels here.
Index-to-layer mapping:
{
0: "Background",
1: "Underlay",
2: "Logo/Image",
3: "Text",
4: "Embellishment"
}We render an image for each element, which is useful during inference.
Meanwhile, we also render the intermediate designs (denoted as layer_{index}.png) and use them for end-to-end training.
Please run the following script to get these assets:
python dataset/src/crello/render.pyAfter rendering, we move to the next step to construct the dataset according to layered design principle. Each sample has 5 rounds of dialogue, where the model progressively predicts element attributes from the background layer to the embellishment layer.
python dataset/src/crello/create_dataset.py --tag oursWe have a model available on Hugging Face. Please download it for inference.
This model is trained on the public Crello dataset. We find that using a dataset approximately five times larger leads to significantly improved performance. For a detailed evaluation, please refer to Table 2 in our paper. Unfortunately, we are unable to release this model as it was trained on a private dataset.
Now it is time to do inference using the prepared data and model:
CUDA_VISIBLE_DEVICES=0 python llava/infer/infer.py \
--model_name_or_path /path/to/model/checkpoint-xxxx \
--data_path /path/to/data/test.json \
--image_folder /path/to/crello_images \
--output_dir /path/to/output_dir \
--start_layer_index 0 \
--end_layer_index 4Besides command-line inference, we also provide a demo interface that allows users to interact with the model via a web-based UI. This interface makes it more user-friendly and better suited for running inference on custom datasets.
To launch the web UI, run the following command:
CUDA_VISIBLE_DEVICES=0 python app/app.py --model_name_or_path /path/to/model/checkpoint-xxxxWe compute the LVM scores and geometry-related metrics for the generated designs:
python llava/metrics/llava_ov.py -i /path/to/output_dirpython llava/metrics/layout.py --pred /path/to/output_dir/pred.jsonlWe fine-tune LLMs using the crello training set for layered design composition.
For your own dataset, please prepare the dataset in the given format and run:
bash scripts/finetune_lora.sh \
1 \
meta-llama/Llama-3.1-8B \
/path/to/dataset/train.json \
/path/to/image/base \
/path/to/output_dir \
50 \
2 \
16 \
250 \
2e-4 \
2e-4 \
cls_pooling \
Llama-3.1-8B_lora_ours \
32 \
64 \
4For example, the specific script in our setting is:
bash scripts/finetune_lora.sh \
1 \
meta-llama/Llama-3.1-8B \
dataset/dataset/json/ours/train.json \
dataset/dataset/crello_images \
output/Llama-3.1-8B_lora_ours \
50 \
2 \
16 \
250 \
2e-4 \
2e-4 \
cls_pooling \
Llama-3.1-8B_lora_ours \
32 \
64 \
4Remember to login to Hugging Face using the Llama access token:
huggingface-cli login --token $TOKENThe following is a list of supported LLMs:
- liuhaotian/llava-v1.5-7b
- liuhaotian/llava-v1.6-vicuna-7b
- meta-llama/Meta-Llama-3-8B
- meta-llama/Meta-Llama-3-8B-Instruct
- meta-llama/Llama-3.1-8B
- meta-llama/Llama-3.1-8B-Instruct
- mistralai/Mistral-7B-v0.3
@InProceedings{lin2024elements,
title={From Elements to Design: A Layered Approach for Automatic Graphic Design Composition},
author={Lin, Jiawei and Sun, Shizhao and Huang, Danqing and Liu, Ting and Li, Ji and Bian, Jiang},
booktitle={CVPR},
year={2025}
}
We would like to express our gratitude to CanvasVAE for providing the dataset, OpenCole for the rendering code, and LLaVA for the codebase. We deeply appreciate all the incredible work that made this project possible.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.
