SIGGRAPH ASIA 2023
SAILOR is a generalizable method for human free-view rendering and reconstruction from very sparse (e.g., 4) RGBD streams , achieving near real-time performance under acceleration.
Our free-view rendering results and bullet-time effects on our real-captured dataset (Unseen performers ).
Please install python dependencies in requirements.txt:
conda create -n SAILOR python=3.8
conda activate SAILOR
pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
Install the surface localization algorithm named ImplicitSeg provided by MonoPort, for our fast post-merging operation.
Our code has been tested under the following system :
- Ubuntu 18.04, 20.04 or 22.04
- Python 3.8 and PyTorch 1.8.0
- GCC/G++ 9.5.0
- Nvidia GPU (RTX 3090) CUDA 11.1 CuDNN
Build the c++ and CUDA libraries:
- VoxelEncoding, FastNerf and Mesh-RenderUtil:
cd c_lib/* && python setup.py install. VoxelEncoding provides the CUDA accelerated versions of TSDF-Fusion, two-layer tree construction, ray-voxel intersection, adaptive points sampling, etc. FastNerf provides a fully-fused version of the MLPs and Hydra-attention for our SRONet. - AugDepth and Depth2Color [optional] (Eigen3, OpenCV, OpenMp and pybind11 are required):
cd c_lib/*
mkdir build && cd build
cmake .. && make
- Clone or download this repo
- Download our pretrained depth denoising model (
latest_model_BodyDRM2.pth) and our rendering model (latest_model_BasicRenNet.pth) here - Move the downloaded models to
./checkpoints_rend/SAILORfolder
The example static test data is provided in ./test_data, the data structure of static (or dynamic) is listed as :
<dataset_name>
|-- COLOR
|-- FRAMExxxx
|-- 0.jpg # input RGB image (1024x1024) for each view
|-- 1.jpg
...
|-- DEPTH
|-- FRAMExxxx
|-- 0.png # input depth image (1024x1024, uint16, unit is m after dividing by 10000) for each view
|-- 1.png
...
|-- MASK
|-- FRAMExxxx
|-- 0.png # input human-region mask (1024x1024) for each view
|-- 1.png
...
|-- PARAM
|-- FRAMExxxx
|-- 0.npy # camera intrinsic ('K': 3x3) and pose ('RT': 3x4) matrices for each view
|-- 1.npyDepth denoising :
- Run
python -m depth_denoising.inference - The original and denoised point clouds are in the
./depth_denoising/resultsfolder. Use meshlab to visualize the 3D results - Modify
basic_path,frame idxandview_idin the fileinference.pyto obtain the results of other examples
SRONet and SRONetUp :
- For provided static data, run
python -m upsampling.inference_static --name SAILOR(in 1K resolution) orpython -m SRONet.inference_static --name SAILOR(in 512 resolution), to obtain the reconstructed 3D mesh and free-view rendering results. - The reconstructed 3D meshes are in the
./checkpoints_rend/SAILOR/val_resultsfolder. To render the 3D mesh, runpython -m utils_render.render_meshto obtain the free-view mesh rendering results. Modifyopts.ren_data_root,obj_pathandobj_namein the filerender_mesh.pyto get new results. - For dynamic data, first download our real-captured data here, unzip the data and put them in the
./test_datafolder - For dynamic data, then run
python -m upsampling.inference_dynamic --name SAILORorpython -m SRONet.inference_dynamic --name SAILORto obtain the rendering results. - Modify
opts.ren_data_rootandopts.data_nameininference_static.pyandinference_dynamic.pyto obtain new rendering results - The rendering images and videos are in the
./SRONet(or upsampling)/resultsfolder.
Interactive rendering :
We release our interactive rendering GUI for our real-captured dataset.
- TensorRT is required to accelerate our depth denoising network and the encoders in SRONet(upsampling). Please refer to TensorRT installation guide and then install torch2trt. Our TensorRT version is 7.2
- Run
python -m depth_denoising.toTensorRT,python -m SRONet.toTensorRTandpython -m upsampling.toTensorRTto obtain the TRTModules (the paramopts.num_gpusintoTensorRT.pycontrols the number of GPUs). The final pth models are in the./SAILOR/accelerated_modelsfolder - Run
python -m gui.gui_render. Modify theopts.ren_data_rootingui_render.pyto test other data, and modify theopts.num_gpusto use 1 GPU (slow) or 2 GPUs. The GIF below shows the rendering result of using 2 Nvidia RTX 3090, an Intel i9-13900k, and an MSI Z790 god-like motherboard
The code, models, and GUI demos in this repository are released under the GPL-3.0 license.
If you find our work helpful to your research, please cite our paper.
@article{dong2023sailor,
author = {Zheng Dong, Xu Ke, Yaoan Gao, Qilin Sun, Hujun Bao, Weiwei Xu, Rynson W.H. Lau},
title = {SAILOR: Synergizing Radiance and Occupancy Fields for Live Human Performance Capture},
year = {2023},
journal = {ACM Transactions on Graphics (TOG)},
volume = {42},
number = {6},
doi = {10.1145/3618370},
publisher = {ACM}
}



