Enhancing Text-to-Video Editing with Motion Map Injection

Abstract

Based on the remarkable performance of text-to-image diffusion models, text-guided video editing studies recently have been expanded. Existing video editing studies have introduced an implicit method of adding cross-frame attention to estimate inter-frame attention, resulting in temporally consistent videos. However, because these methods use models pre-trained on text-image pair data, they do not handle unique property of video: motion. When editing a video with prompts, the attention map of the prompt implying the motion of the video (e.g. 'running', 'moving') is prone to be poorly estimated, which causes inaccurate video editing. To address this problem, we propose the `Motion Map Injection' (MMI) module to consider motion explicitly. The MMI module provides text-to-video (T2V) models a simple but effective way to convey motion in three steps: 1) extracting motion map, 2) calculating the similarity between the motion map and the attention map of each prompt, and 3) injecting motion map into the attention maps. Given experimental results, input video can be edited accurately with MMI module. To the best of our knowledge, our study is the first method that utilizes the motion in video for text-to-video editing.

Examples

You can find more experimental results on our project page.

Input Video	Video-P2P	Video-P2P + Ours
"There is a spreading water on the man"	"There is a colorful spreading water on the man"	"There is a colorful spreading water on the man"

"clouds flowing on the mountain"	"lava flowing on the mountain"	"lava flowing on the mountain"

"spinning wings of windmill are beside the river"	"yellow spinning wings of windmill are beside the river"	"yellow spinning wings of windmill are beside the river"

Setup

The environment is very similar to Video-P2P.

The versions of the packages we installed are:

torch: 1.12.1
xformers: 0.0.15.dev0+0bad001.d20230712

In the case of xformers, I installed it through the link introduced by Video-P2P.

pip install -r requirements.txt

Weights

We use the pre-trained stable diffusion model. You can download it here.

Quickstart

Since we developed our codes based on Video-P2P codes, you could refer to their github, if you need.

Please replace pretrained_model_path with the path to your stable-diffusion.

To download the pre-trained model, please refer to diffusers.

# Stage 1: Tuning to do model initialization.

# You can minimize the tuning epochs to speed up.
python run_tuning.py  --config="configs/cloud-1-tune.yaml"

# Stage 2: Attention Control

python run_attention_flow.py --config="configs/cloud-1-p2p.yaml" --motion_prompt "Please enter motion prompt"

# If the prompt is "clouds flowing under a skyscraper", the motion prompt is "flowing".
# You can input the motion prompt as below.

python run_attention_flow.py --config="configs/cloud-1-p2p.yaml" --motion_prompt "flowing"

Find your results in Video-P2P/outputs/xxx/results.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
configs		configs
data		data
gradio_utils		gradio_utils
results		results
tmp		tmp
tuneavideo		tuneavideo
README.md		README.md
app_gradio.py		app_gradio.py
motion_pre_process.py		motion_pre_process.py
ptp_utils.py		ptp_utils.py
requirements.txt		requirements.txt
run_MMI.py		run_MMI.py
run_tuning.py		run_tuning.py
seq_aligner.py		seq_aligner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enhancing Text-to-Video Editing with Motion Map Injection

Abstract

Examples

You can find more experimental results on our project page.

Setup

Weights

Quickstart

About

Uh oh!

Releases

Packages

Uh oh!

Languages

currycurry915/Motion_Map_Injection

Folders and files

Latest commit

History

Repository files navigation

Enhancing Text-to-Video Editing with Motion Map Injection

Abstract

Examples

You can find more experimental results on our project page.

Setup

Weights

Quickstart

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages