EASI

Holistic Evaluation of Multimodal LLMs on Spatial Intelligence

TL;DR

EASI is a unified evaluation suite for Spatial Intelligence in multimodal LLMs.
After installation, you can quickly try a SenseNova-SI model with:

python run.py --data MindCubeBench_tiny_raw_qa \
              --model SenseNova-SI-1.2-InternVL3-8B \
              --verbose --reuse --judge extract_matching

Overview

EASI is a unified evaluation suite for Spatial Intelligence. It benchmarks state-of-the-art proprietary and open-source multimodal LLMs across a growing set of spatial benchmarks.

Key features include:

Supports the evaluation of state-of-the-art Spatial Intelligence models.
Systematically collects and integrates evolving Spatial Intelligence benchmarks.

As of v0.1.4, EASI supports 21 Spatial Intelligence models and 17 spatial benchmarks, and the list is continuously expanding. Full details are available at 👉 Supported Models & Benchmarks.

🗓️ News

🌟 [2025-12-19] EASI v0.1.4 is released. Major updates include:

Expanded benchmark support
Added 4 benchmarks: SPBench, MMSI-Video-Bench, VSI-SUPER-Recall, VSI-SUPER-Count.

For the full release history and detailed changelog, please see 👉 Changelog.

🛠️ QuickStart

Installation

Option 1: Local environment

git clone --recursive https://github.com/EvolvingLMMs-Lab/EASI.git
cd EASI
pip install -e ./VLMEvalKit

Option 2: Docker-based environment

bash dockerfiles/EASI/build_runtime_docker.sh

docker run --gpus all -it --rm \
  -v /path/to/your/data:/mnt/data \
  --name easi-runtime \
  vlmevalkit_EASI:latest \
  /bin/bash

Evaluation

General command

python run.py --data {BENCHMARK_NAME} --model {MODEL_NAME} --judge {JUDGE_MODE} --verbose --reuse

Please refer to the Configuration section below for the full list of available models and benchmarks . See run.py for the full list of arguments.

Example

Evaluate SenseNova-SI-1.2-InternVL3-8B on MindCubeBench_tiny_raw_qa:

python run.py --data MindCubeBench_tiny_raw_qa \
              --model SenseNova-SI-1.2-InternVL3-8B \
              --verbose --reuse --judge extract_matching

This uses regex-based answer extraction. For LLM-based judging (e.g., on SpatialVizBench_CoT), switch to the OpenAI judge:

export OPENAI_API_KEY=YOUR_KEY
python run.py --data SpatialVizBench_CoT \
              --model {MODEL_NAME} \
              --verbose --reuse --judge gpt-4o-1120

Configuration

VLM Configuration: During evaluation, all supported VLMs are configured in vlmeval/config.py. Make sure you can successfully infer with the VLM before starting the evaluation with the following command vlmutil check {MODEL_NAME}.

Benchmark Configuration: The full list of supported Benchmarks can be found in the official VLMEvalKit documentation VLMEvalKit Supported Benchmarks.

For the EASI Leaderboard, all EASI benchmarks are summarized in Supported Models & Benchmarks. A minimal example of recommended --data settings for EASI is:

Benchmark	Evaluation settings
VSI-Bench	VSI-Bench_32frame
	VSI-Bench-Debiased_32frame
MindCube	MindCubeBench_tiny_raw_qa

Submision

To submit your evaluation results to our EASI Leaderboard:

Go to the EASI Leaderboard page.
Click 🚀 Submit here! to the submission form.
Follow the instructions to fill in the submission form, and submit your results.

🖊️ Citation

@article{easi2025,
  title={Holistic Evaluation of Multimodal LLMs on Spatial Intelligence},
  author={Cai, Zhongang and Wang, Yubo and Sun, Qingping and Wang, Ruisi and Gu, Chenyang and Yin, Wanqi and Lin, Zhiqian and Yang, Zhitao and Wei, Chen and Shi, Xuanke and Deng, Kewang and Han, Xiaoyang and Chen, Zukai and Li, Jiaqi and Fan, Xiangyu and Deng, Hanming and Lu, Lewei and Li, Bo and Liu, Ziwei and Wang, Quan and Lin, Dahua and Yang, Lei},
  journal={arXiv preprint arXiv:2508.13142},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
VLMEvalKit @ 2a958bc		VLMEvalKit @ 2a958bc
dockerfiles		dockerfiles
docs		docs
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EASI

TL;DR

Overview

🗓️ News

🛠️ QuickStart

Installation

Option 1: Local environment

Option 2: Docker-based environment

Evaluation

Configuration

Submision

🖊️ Citation

About

Uh oh!

Releases 5

Packages

Contributors 4

Uh oh!

Languages

License

EvolvingLMMs-Lab/EASI

Folders and files

Latest commit

History

Repository files navigation

EASI

TL;DR

Overview

🗓️ News

🛠️ QuickStart

Installation

Option 1: Local environment

Option 2: Docker-based environment

Evaluation

Configuration

Submision

🖊️ Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 4

Uh oh!

Languages

Packages