Skip to content

[ICLR 2026] RoRE: Rotary Ray Embedding for Generalised Multi-Modal Scene Understanding

Notifications You must be signed in to change notification settings

RoboticImaging/RoRE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RoRE: Rotary Ray Embedding for Generalised Multi-Modal Scene Understanding

ICLR 2026

Ryan Griffiths, Donald G. Dansereau

Project Page, Paper

Setup

Environment

A requirements.txt, and docker file are provided

pip install -r requirements.txt 

or

bash docker_build.sh

Data

For RealEstate10K we use the same data format as pixelSplat. Please follow the data formating instructions provided there. You can also download a preprocessed dataset here. The dataset can be left in the zip file and loaded directly from it.

A subset of our synthetic multimodal dataset can be found here.

Usage

Training

Training is down as follows.

For training rgb on RealEstate10K:

bash ./nvs.sh --ray_encoding plucker --pos_enc rore --gpus "0,1,..." --dataset_path /path/to/re10k.zip

For training rgb-thermal on our synthetic rgb-thermal dataset:

bash ./nvs.sh --ray_encoding plucker --pos_enc rore --gpus "0,1,..." --dataset_path /path/to/MultimodalBlender --dataset_type multimodal

Alternative embedding methods can be selected.

Validation

The validation on different zooming-in (focal length) factors can be done via:

bash ./nvs.sh --ray_encoding plucker --pos_enc rore --gpus "0,1,..." --test-zoom-in "2" --dataset_path "/path/to/re10k.zip"

And different synthetic distortion on re10k:

bash ./nvs.sh --ray_encoding plucker --pos_enc rore --gpus "0,1,..." --test-distortion "2" --dataset_path "/path/to/re10k.zip"

Testing only on the multimodal dataset:

bash ./nvs.sh --ray_encoding plucker --pos_enc rore --gpus "0,1,..." --dataset_path "/path/to/MultimodalBlender" --dataset_type multimodal --test-only

NOTE: Here the network architecture used for rgb and rgb-thermal are the same, for simplicity. In the paper for the multimodal networked differed slightly rgb only.

Acknowledgement

This repository is built on top of LVSM and PRoPE repositories. Who we thank for making their work open-source.

Citation

If you find this work useful, please consider citing our work:

@inproceedings{rore2026,
  title={RoRE: Rotary Ray Embedding for Generalised Multi-Modal Scene Understanding},
  author={Ryan Griffiths and Donald G. Dansereau},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://openreview.net/forum?id=BR2ItBcqOo}
}

About

[ICLR 2026] RoRE: Rotary Ray Embedding for Generalised Multi-Modal Scene Understanding

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published