How Far are AI-generated Videos from Simulating the 3D Visual World: A Learned 3D Evaluation Approach

(ICCV 2025)

Chirui Chang · Jiahui Liu · Zhengzhe Liu · Xiaoyang Lyu · Yi-Hua Huang · Xin Tao · Pengfei Wan · Di Zhang · Xiaojuan Qi^✉
The University of Hong Kong | Kling Team, Kuaishou Technology | Lingnan University
^✉Corresponding author

L3DE Video Evaluation Pipeline

This repository provides a one-pass video evaluation pipeline: given one or more videos, it extracts frames, runs three external feature extractors (DINOv2, RAFT optical flow, UniDepth / monocular depth) via small proxy scripts, packs the results into a fixed format, feeds them to 3D-aware scorer (L3DE), and finally writes per-video scores.

Note on third-party code
The proxy/ directory contains 3 vendored/open-source tools (e.g. DINOv2, RAFT, UniDepth) plus extractor scripts.

1. Repository structure

A recommended layout:

.
├── l3de_pipeline.py        # main pipeline script (video → frames → features → L3DE score)
├── environment.yml         # conda environment to reproduce your setup
├── proxy/
│   ├── dinov2/             # Appearance proxy: DINOv2 code (third-party)
|       └── extract_*.py        # proxy scripts to call the corresponding models
│   ├── RAFT/               # Motion proxy: RAFT optical-flow code (third-party)
|       └── extract_*.py        # proxy scripts to call the corresponding models
│   ├── UniDepth/           # Geometry proxy: UniDepth code (third-party)
│       └── extract_*.py        # proxy scripts to call the corresponding models
├── weights/
│   └── L3DE.pth            # your trained L3DE checkpoint
├── README.md
└── LICENSE

2. Installation

conda env create -f environment.yml
conda activate L3DE

The provided environment.yml is meant to reproduce the environment to run the pipeline.
If your CUDA / GPU setup is different, install a compatible torch after activating the environment.

3. Checkpoints

Download the pre-trained L3DE model from: Google Drive

Then place it as:

weights/
└── L3DE.pth

4. Usage

4.1 Single video

Run:

python l3de_pipeline.py \
  --input ./examples/demo.mp4 \
  --work-root ./workdir \
  --l3de-weights ./weights/L3DE.pth

What happens:

The script samples frames from the video (by default: 25 frames from the first 4 seconds).
The script calls the three proxy extractors under proxy/ (DINOv2, RAFT, UniDepth).
The script packs the three modalities into the format that L3DE expects.
The script runs the L3DE model and writes a score file.

You should get a structure like:

workdir/
└── demo/
    ├── frames/          # extracted RGB frames
    ├── dinov2/          # appearance features
    ├── flows/           # RAFT optical flow
    ├── mdepth/          # depth / geometry features
    ├── l3de_scores.npy  # optional numpy dump
    └── scores.json      # {"l3de_score": ...}

4.2 Directory of videos

python l3de_pipeline.py \
  --input ./videos \
  --video-glob "*.mp4" \
  --work-root ./workdir \
  --l3de-weights ./weights/L3DE.pth

Every matched video will be processed.
A CSV will be written, collecting all per-video scores.

Citation

Please cite our paper if you find our work helpful.

@inproceedings{chang2025far,
  title={How Far are AI-generated Videos from Simulating the 3D Visual World: A Learned 3D Evaluation Approach},
  author={Chang, Chirui and Liu, Jiahui and Liu, Zhengzhe and Lyu, Xiaoyang and Huang, Yi-Hua and Tao, Xin and Wan, Pengfei and Zhang, Di and Qi, Xiaojuan},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={10307--10317},
  year={2025}
}

Acknowledgement

This repository makes use of several excellent open-source projects:

DINOv2 for appearance/feature extraction.
RAFT for dense optical flow estimation.
UniDepth for geometry/depth cues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

How Far are AI-generated Videos from Simulating the 3D Visual World: A Learned 3D Evaluation Approach

(ICCV 2025)

L3DE Video Evaluation Pipeline

1. Repository structure

2. Installation

3. Checkpoints

4. Usage

4.1 Single video

4.2 Directory of videos

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
proxy		proxy
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
l3de_pipeline.py		l3de_pipeline.py

License

CVMI-Lab/L3DE

Folders and files

Latest commit

History

Repository files navigation

How Far are AI-generated Videos from Simulating the 3D Visual World: A Learned 3D Evaluation Approach

(ICCV 2025)

L3DE Video Evaluation Pipeline

1. Repository structure

2. Installation

3. Checkpoints

4. Usage

4.1 Single video

4.2 Directory of videos

Citation

Acknowledgement

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages