A requirements.txt, and docker file are provided
pip install -r requirements.txt
or
bash docker_build.sh
For RealEstate10K we use the same data format as pixelSplat. Please follow the data formating instructions provided there. You can also download a preprocessed dataset here. The dataset can be left in the zip file and loaded directly from it.
A subset of our synthetic multimodal dataset can be found here.
Training is down as follows.
For training rgb on RealEstate10K:
bash ./nvs.sh --ray_encoding plucker --pos_enc rore --gpus "0,1,..." --dataset_path /path/to/re10k.zip
For training rgb-thermal on our synthetic rgb-thermal dataset:
bash ./nvs.sh --ray_encoding plucker --pos_enc rore --gpus "0,1,..." --dataset_path /path/to/MultimodalBlender --dataset_type multimodal
Alternative embedding methods can be selected.
The validation on different zooming-in (focal length) factors can be done via:
bash ./nvs.sh --ray_encoding plucker --pos_enc rore --gpus "0,1,..." --test-zoom-in "2" --dataset_path "/path/to/re10k.zip"
And different synthetic distortion on re10k:
bash ./nvs.sh --ray_encoding plucker --pos_enc rore --gpus "0,1,..." --test-distortion "2" --dataset_path "/path/to/re10k.zip"
Testing only on the multimodal dataset:
bash ./nvs.sh --ray_encoding plucker --pos_enc rore --gpus "0,1,..." --dataset_path "/path/to/MultimodalBlender" --dataset_type multimodal --test-only
NOTE: Here the network architecture used for rgb and rgb-thermal are the same, for simplicity. In the paper for the multimodal networked differed slightly rgb only.
This repository is built on top of LVSM and PRoPE repositories. Who we thank for making their work open-source.
If you find this work useful, please consider citing our work:
@inproceedings{rore2026,
title={RoRE: Rotary Ray Embedding for Generalised Multi-Modal Scene Understanding},
author={Ryan Griffiths and Donald G. Dansereau},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=BR2ItBcqOo}
}