diff --git a/README.md b/README.md index 50aeb1e..6a24582 100644 --- a/README.md +++ b/README.md @@ -1,13 +1,29 @@ -# TransCG: A Large-Scale Real-World Dataset for Transparent Object Depth Completion and Grasping +# TransCG: A Large-Scale Real-World Dataset for Transparent Object Depth Completion and A Grasping Baseline -[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/transcg-a-large-scale-real-world-dataset-for/transparent-object-depth-estimation-on)](https://paperswithcode.com/sota/transparent-object-depth-estimation-on?p=transcg-a-large-scale-real-world-dataset-for) [![CC BY-NC-SA 4.0][cc-by-nc-sa-shield]][cc-by-nc-sa] +[![CC BY-NC-SA 4.0][cc-by-nc-sa-shield]][cc-by-nc-sa] -[[Paper]](https://arxiv.org/pdf/2202.08471) [[Project Page]](https://graspnet.net/transcg) +[[Paper (IEEE Xplore)]](https://ieeexplore.ieee.org/document/9796631) [[Paper (arXiv)]](https://arxiv.org/pdf/2202.08471) [[Project Page]](https://graspnet.net/transcg) **Authors**: [Hongjie Fang](https://github.com/galaxies99/), [Hao-Shu Fang](https://github.com/fang-haoshu), [Sheng Xu](https://github.com/XS1020), [Cewu Lu](https://mvig.sjtu.edu.cn/). Welcome to the official repository for the TransCG paper. This repository includes the dataset and the proposed Depth Filler Net (DFNet) models. +## News + +2022-10-14: The correct checkpoint has been updated. Check the [Quick Start](#quick-start) section for details. Many thanks to [@ZhiyangZhou24](https://github.com/ZhiyangZhou24) for reporting the issue. + +2022-10-10: New checkpoint and source code are released. Check the [Quick Start](#quick-start) section for details. This checkpoint and source code fixes the shifting problem to a large extent (now only ~2cm shifting, which can be solved further using many engineering methods), and use interpolation to solve the empty hole problem. Many thanks to [@haojieh](https://github.com/haojieh) and [@mtbui2010](https://github.com/mtbui2010) for mentioning it. The new checkpoint has improved several metrics, see details in [assets/docs/DFNet.md](assets/docs/DFNet.md). + +2022-10-02: For checkpoint and source code that correspond to the paper, please see [this version](https://github.com/Galaxies99/TransCG/tree/f80708ac4243e9f9d3f5a7b11afd863b21506f76) of our repository. Shifting problems in this version may be solved by calculating the difference of the average depth before and after refining, and then subtract the difference from the refining depths. + +2022-09-16: New version of DFNet code is released. Many thanks to [@cxt98](https://github.com/cxt98) for fixing the bugs and [@haberger](https://github.com/haberger) for mentioning it. + +2022-06-15: Our TransCG paper is published in IEEE Robotics and Automation Letters Vol. 7, No. 3, 2022, and is available at [IEEE Xplore](https://ieeexplore.ieee.org/document/9796631). + +2022-06-01: Our TransCG paper is accepted by RA-L. + +2022-02-17: Our paper is released on [arXiv](https://arxiv.org/pdf/2202.08471), and submitted to IEEE Robotics and Automation Letters (RA-L). + ## TransCG Dataset TransCG dataset is now available on [official page](https://graspnet.net/transcg). TransCG dataset is the first large-scale real-world dataset for transparent object depth completion and grasping. In total, our dataset contains 57,715 RGB-D images of 51 transparent objects and many opaque objects captured from different perspectives of 130 scenes under various real-world settings. The 3D mesh model of the transparent objects are also provided in our dataset. @@ -41,9 +57,7 @@ pip install -r requirements.txt ### Quick Start -**NOTE.** The following checkpoint is compatible with [this version](https://github.com/Galaxies99/TransCG/tree/f80708ac4243e9f9d3f5a7b11afd863b21506f76). We will update the checkpoint of the latest version later. - -Our pretrained checkpoint is available on [Google Drive](https://drive.google.com/file/d/1APIuzIQmFucDP4RcmiNV-NEsQKqN9J57/view?usp=sharing) or [Baidu Netdisk](https://pan.baidu.com/s/14khejj63OjOKsyzxnuYo5Q) (Code: c01g). The checkpoint is trained with the default configuration in the `configs` folder. You can use our released checkpoints for [inference](#inference) or [testing](#testing-optional). Refer to [assets/docs/DFNet.md](assets/docs/DFNet.md) for details about the depth completion network. +Our pretrained checkpoint is available on [Google Drive](https://drive.google.com/file/d/1oZi9zdOg0WYuTHM10xlyq5FRlfoKDKzU/view?usp=sharing) or [Baidu Netdisk](https://pan.baidu.com/s/1G9OaZ1Kk-KmHWOUHARsgNQ) (Code: bpes). The checkpoint is trained with the default configuration in the `configs` folder. You can use our released checkpoints for [inference](#inference) or [testing](#testing-optional). Refer to [assets/docs/DFNet.md](assets/docs/DFNet.md) for details about the depth completion network. ### Grasping Demo diff --git a/assets/docs/DFNet.md b/assets/docs/DFNet.md index 27a98c7..98c1085 100644 --- a/assets/docs/DFNet.md +++ b/assets/docs/DFNet.md @@ -10,15 +10,23 @@ The architecture of our proposed end-to-end depth completion network DFNet is sh ## Experiments - +| Method | RMSE | REL | MAE | Delta 1.05 | Delta 1.10 | Delta 1.25 | GPU Mem. Occ. | Infer. Time | Model Size | +| ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | +| ClearGrasp | 0.054 | 0.083 | 0.037 | 50.48 | 68.68 | 95.28 | 2.1 GB | 2.2813s | 934 MB | +| LIDF-Refine | 0.019 | 0.034 | 0.015 | 78.22 | 94.26 | 99.80 | 6.2 GB | 0.0182s | 251 MB | +| TranspareNet* | 0.026 | **0.023** | **0.013** | **88.45** | 96.25 | 99.42 | 1.9 GB | 0.0354s | 336 MB | +| DFNet** | **0.018** | 0.026 | **0.013** | 84.94 | **96.57** | **99.85** | **1.6 GB** | **0.0166s** | **5.2 MB** | -Experiments demonstrate superior efficacy, efficiency and robustness of our method over previous works, and it is able to process images of high resolutions under limited hardware resources, as shown in the following figures. +Here, ClearGrasp refers to [1], LIDF-Refine refers to [2], TranspareNet refers to [3], and DFNet refers to our proposed Depth Filler Net. - +*: TranspareNet [3] is a concurrent work with our project. -**Note**. In experiment tables above, ClearGrasp (or [34]) refers to "ClearGrasp: 3D Shape Estimation of Transparent Objects for Manipulation" (ICRA 2020), and LIDF-Refine (or [41]) refers to "RGB-D Local Implicit Function for Depth Completion of Transparent Objects" (CVPR 2021). +**: Here, we use the newly released checkpoint of DFNet, which is slightly different from the checkpoint used in the paper. The newly released checkpoint fixes the bugs of point cloud shifting mentioned in [Issue #4](https://github.com/Galaxies99/TransCG/issues/4) and the black-hole problem mentioned in [Issue #7](https://github.com/Galaxies99/TransCG/issues/7). + +For original checkpoint that is used in the paper, please use [this version](https://github.com/Galaxies99/TransCG/tree/f80708ac4243e9f9d3f5a7b11afd863b21506f76) of the repository, and see [Google Drive](https://drive.google.com/file/d/1APIuzIQmFucDP4RcmiNV-NEsQKqN9J57/view?usp=sharing) or [Baidu Netdisk](https://pan.baidu.com/s/14khejj63OjOKsyzxnuYo5Q) (Code: c01g) for downloading it. Many thanks to [@cxt98](https://github.com/cxt98) for fixing the bugs in [Issue #5](https://github.com/Galaxies99/TransCG/issues/5). ## References 1. Sajjan, Shreeyak, et al. "Clear grasp: 3d shape estimation of transparent objects for manipulation." 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020. 2. Zhu, Luyang, et al. "RGB-D Local Implicit Function for Depth Completion of Transparent Objects." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. +3. Xu, Haoping, et al. "Seeing Glass: Joint Point-Cloud and Depth Completion for Transparent Objects." 5th Annual Conference on Robot Learning. 2021. diff --git a/assets/docs/grasping.md b/assets/docs/grasping.md index 1c515a4..5797e30 100644 --- a/assets/docs/grasping.md +++ b/assets/docs/grasping.md @@ -6,8 +6,6 @@ Given an RGB image along with a depth image collected by an RGB-D camera, we fir ## Experiments - - ## Reference 1. Fang, Hao-Shu, et al. "Graspnet-1billion: A large-scale benchmark for general object grasping." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020. diff --git a/assets/imgs/exper1.png b/assets/imgs/exper1.png deleted file mode 100644 index 5ef8bd5..0000000 Binary files a/assets/imgs/exper1.png and /dev/null differ diff --git a/assets/imgs/exper2.png b/assets/imgs/exper2.png deleted file mode 100644 index 8e7655d..0000000 Binary files a/assets/imgs/exper2.png and /dev/null differ diff --git a/assets/imgs/exper3.png b/assets/imgs/exper3.png deleted file mode 100644 index d4325a3..0000000 Binary files a/assets/imgs/exper3.png and /dev/null differ diff --git a/configs/default.yaml b/configs/default.yaml index f540c0b..3cd23bb 100644 --- a/configs/default.yaml +++ b/configs/default.yaml @@ -25,16 +25,18 @@ "use_augmentation": True "rgb_augmentation_probability": 0.8 "depth_min": 0.3 - "depth_max": 1.5 + "depth_max": 1.0 "depth_norm": 1.0 + "with_original": True "test": "type": "transcg" "data_dir": "data" "image_size": !!python/tuple [320, 240] "use_augmentation": False "depth_min": 0.3 - "depth_max": 1.5 + "depth_max": 1.0 "depth_norm": 1.0 + "with_original": True "dataloader": "num_workers": 48 diff --git a/configs/inference.yaml b/configs/inference.yaml index cf5010e..0fc17dc 100644 --- a/configs/inference.yaml +++ b/configs/inference.yaml @@ -11,5 +11,5 @@ "image_size": !!python/tuple [320, 240] "cuda_id": 0 "depth_min": 0.3 - "depth_max": 1.5 + "depth_max": 1.0 "depth_norm": 1.0 diff --git a/datasets/transcg.py b/datasets/transcg.py index 97715f3..9d8779b 100644 --- a/datasets/transcg.py +++ b/datasets/transcg.py @@ -51,13 +51,15 @@ def __init__(self, data_dir, split = 'train', **kwargs): self.sample_info.append([ os.path.join(self.data_dir, 'scene{}'.format(scene_id), '{}'.format(perspective_id)), 1, # (for D435) - scene_type + scene_type, + perspective_id ]) for perspective_id in self.scene_metadata[scene_id]['L515_valid_perspective_list']: self.sample_info.append([ os.path.join(self.data_dir, 'scene{}'.format(scene_id), '{}'.format(perspective_id)), 2, # (for L515) - scene_type + scene_type, + perspective_id ]) # Integrity double-check assert len(self.sample_info) == self.total_samples, "Error in total samples, expect {} samples, found {} samples.".format(self.total_samples, len(self.sample_info)) @@ -72,12 +74,16 @@ def __init__(self, data_dir, split = 'train', **kwargs): self.with_original = kwargs.get('with_original', False) def __getitem__(self, id): - img_path, camera_type, scene_type = self.sample_info[id] + img_path, camera_type, scene_type, perspective_id = self.sample_info[id] rgb = np.array(Image.open(os.path.join(img_path, 'rgb{}.png'.format(camera_type))), dtype = np.float32) depth = np.array(Image.open(os.path.join(img_path, 'depth{}.png'.format(camera_type))), dtype = np.float32) depth_gt = np.array(Image.open(os.path.join(img_path, 'depth{}-gt.png'.format(camera_type))), dtype = np.float32) depth_gt_mask = np.array(Image.open(os.path.join(img_path, 'depth{}-gt-mask.png'.format(camera_type))), dtype = np.uint8) - return process_data(rgb, depth, depth_gt, depth_gt_mask, self.cam_intrinsics[camera_type], scene_type = scene_type, camera_type = camera_type, split = self.split, image_size = self.image_size, depth_min = self.depth_min, depth_max = self.depth_max, depth_norm = self.depth_norm, use_aug = self.use_aug, rgb_aug_prob = self.rgb_aug_prob, with_original = self.with_original) + if camera_type == 1: + depth_coeff = (perspective_id // 20) / 12 + 1 + else: + depth_coeff = 1 + return process_data(rgb, depth, depth_gt, depth_gt_mask, self.cam_intrinsics[camera_type], scene_type = scene_type, camera_type = camera_type, split = self.split, image_size = self.image_size, depth_min = self.depth_min, depth_max = self.depth_max, depth_norm = self.depth_norm, use_aug = self.use_aug, rgb_aug_prob = self.rgb_aug_prob, depth_coeff = depth_coeff, inpainting = True, with_original = self.with_original) def __len__(self): return self.total_samples diff --git a/inference.py b/inference.py index 46090ec..37c29e3 100644 --- a/inference.py +++ b/inference.py @@ -15,6 +15,7 @@ from utils.logger import ColoredLogger from utils.builder import ConfigBuilder from time import perf_counter +from scipy.interpolate import NearestNDInterpolator class Inferencer(object): @@ -68,7 +69,7 @@ def __init__(self, cfg_path = os.path.join('configs', 'inference.yaml'), with_in self.depth_min, self.depth_max = self.builder.get_inference_depth_min_max() self.depth_norm = self.builder.get_inference_depth_norm() - def inference(self, rgb, depth, target_size = (1280, 720)): + def inference(self, rgb, depth, target_size = (1280, 720), depth_coefficient = 10.0, inpainting = True): """ Inference. @@ -77,7 +78,11 @@ def inference(self, rgb, depth, target_size = (1280, 720)): rgb, depth: the initial RGB-D image; - target_size: tuple of (int, int), optional, default: (1280, 720), the target depth image size. + target_size: tuple of (int, int), optional, default: (1280, 720), the target depth image size; + + depth_coefficient: float, optional, default: 10.0, only regard [depth_mu - depth_coefficient * depth_std, depth_mu + depth_coefficient * depth_std] as the valid pixels; + + inpainting: bool, default: True, whether to inpaint the invalid pixels. Returns ------- @@ -90,7 +95,20 @@ def inference(self, rgb, depth, target_size = (1280, 720)): depth = np.where(depth < self.depth_min, 0, depth) depth = np.where(depth > self.depth_max, 0, depth) depth[np.isnan(depth)] = 0 + depth_available = depth[depth > 0] + depth_mu = depth_available.mean() if depth_available.shape[0] != 0 else 0 + depth_std = depth_available.std() if depth_available.shape[0] != 0 else 1 + depth = np.where(depth < depth_mu - depth_coefficient * depth_std, 0, depth) + depth = np.where(depth > depth_mu + depth_coefficient * depth_std, 0, depth) + if inpainting: + mask = np.where(depth > 0) + if mask[0].shape[0] != 0: + interp = NearestNDInterpolator(np.transpose(mask), depth[mask]) + depth = interp(*np.indices(depth.shape)) depth = depth / self.depth_norm + depth_min = depth.min() - 0.5 * depth.std() - 1e-6 + depth_max = depth.max() + 0.5 * depth.std() + 1e-6 + depth = (depth - depth_min) / (depth_max - depth_min) rgb = (rgb / 255.0).transpose(2, 0, 1) rgb = torch.FloatTensor(rgb).to(self.device).unsqueeze(0) depth = torch.FloatTensor(depth).to(self.device).unsqueeze(0) @@ -101,7 +119,12 @@ def inference(self, rgb, depth, target_size = (1280, 720)): if self.with_info: self.logger.info("Inference finished, time: {:.4f}s.".format(time_end - time_start)) depth_res = depth_res.squeeze(0).cpu().detach().numpy() + depth_ori = depth.squeeze(0).cpu().detach().numpy() + depth_res = depth_res * (depth_max - depth_min) + depth_min + depth_ori = depth_ori * (depth_max - depth_min) + depth_min depth_res = depth_res * self.depth_norm + depth_ori = depth_ori * self.depth_norm depth_res = cv2.resize(depth_res, target_size, interpolation = cv2.INTER_NEAREST) - return depth_res + depth_ori = cv2.resize(depth_ori, target_size, interpolation = cv2.INTER_NEAREST) + return depth_res, depth_ori \ No newline at end of file diff --git a/models/DFNet.py b/models/DFNet.py index 161f400..fccc949 100644 --- a/models/DFNet.py +++ b/models/DFNet.py @@ -149,8 +149,7 @@ def __init__(self, in_channels = 4, hidden_channels = 64, L = 5, k = 12, use_DUC nn.Conv2d(self.hidden_channels, self.hidden_channels, kernel_size = 3, stride = 1, padding = 1), nn.BatchNorm2d(self.hidden_channels), nn.ReLU(True), - nn.Conv2d(self.hidden_channels, 1, kernel_size = 3, stride = 1, padding = 1), - nn.ReLU(True) + nn.Conv2d(self.hidden_channels, 1, kernel_size = 1, stride = 1) ) def _make_upconv(self, in_channels, out_channels, upscale_factor = 2): diff --git a/sample_inference.py b/sample_inference.py index 520e112..b65299d 100644 --- a/sample_inference.py +++ b/sample_inference.py @@ -45,19 +45,19 @@ def draw_point_cloud(color, depth, camera_intrinsics, use_mask = False, use_inpa inferencer = Inferencer() -rgb = np.array(Image.open('data/scene1/1/rgb1.png'), dtype = np.float32) -depth = np.array(Image.open('data/scene1/1/depth1.png'), dtype = np.float32) -depth_gt = np.array(Image.open('data/scene1/1/depth1-gt.png'), dtype = np.float32) +rgb = np.array(Image.open('data/scene21/1/rgb1.png'), dtype = np.float32) +depth = np.array(Image.open('data/scene21/1/depth1.png'), dtype = np.float32) +depth_gt = np.array(Image.open('data/scene21/1/depth1-gt.png'), dtype = np.float32) depth = depth / 1000 depth_gt = depth_gt / 1000 -res = inferencer.inference(rgb, depth) +res, depth = inferencer.inference(rgb, depth, depth_coefficient = 3, inpainting = True) cam_intrinsics = np.load('data/camera_intrinsics/1-camIntrinsics-D435.npy') -res = np.clip(res, 0.1, 1.5) -depth = np.clip(depth, 0.1, 1.5) +res = np.clip(res, 0.3, 1.0) +depth = np.clip(depth, 0.3, 1.0) cloud = draw_point_cloud(rgb, res, cam_intrinsics, scale = 1.0) cloud_gt = draw_point_cloud(rgb, depth_gt, cam_intrinsics, scale = 1.0) diff --git a/test.py b/test.py index 02192f7..7ba079a 100644 --- a/test.py +++ b/test.py @@ -73,6 +73,8 @@ def test(): time_start = perf_counter() res = model(data_dict['rgb'], data_dict['depth']) time_end = perf_counter() + depth_scale = data_dict['depth_max'] - data_dict['depth_min'] + res = res * depth_scale.reshape(-1, 1, 1) + data_dict['depth_min'].reshape(-1, 1, 1) data_dict['pred'] = res _ = metrics.evaluate_batch(data_dict, record = True) duration = time_end - time_start diff --git a/train.py b/train.py index 1de4a62..e6943db 100644 --- a/train.py +++ b/train.py @@ -90,7 +90,9 @@ def train_one_epoch(epoch): for data_dict in pbar: optimizer.zero_grad() data_dict = to_device(data_dict, device) - res = model(data_dict['rgb'], data_dict['depth']) + res = model(data_dict['rgb'], data_dict['depth']) + depth_scale = data_dict['depth_max'] - data_dict['depth_min'] + res = res * depth_scale.reshape(-1, 1, 1) + data_dict['depth_min'].reshape(-1, 1, 1) data_dict['pred'] = res loss_dict = criterion(data_dict) loss = loss_dict['loss'] @@ -118,6 +120,8 @@ def test_one_epoch(epoch): time_start = perf_counter() res = model(data_dict['rgb'], data_dict['depth']) time_end = perf_counter() + depth_scale = data_dict['depth_max'] - data_dict['depth_min'] + res = res * depth_scale.reshape(-1, 1, 1) + data_dict['depth_min'].reshape(-1, 1, 1) data_dict['pred'] = res loss_dict = criterion(data_dict) loss = loss_dict['loss'] diff --git a/utils/builder.py b/utils/builder.py index 2789819..78bec28 100644 --- a/utils/builder.py +++ b/utils/builder.py @@ -479,7 +479,7 @@ def get_inference_depth_norm(self, inference_params = None): Parameters ---------- - inference_params: dict, optional, default: None. If inference_params is provided, then use the parameters specified in the inference_params to get the inference depth range. Otherwise, the inference parameters in the self.params will be used to get the inference depth range. + inference_params: dict, optional, default: None. If inference_params is provided, then use the parameters specified in the inference_params to get the inference depth normalization coefficient. Otherwise, the inference parameters in the self.params will be used to get the inference depth normalization coefficient. Returns ------- diff --git a/utils/data_preparation.py b/utils/data_preparation.py index 0df3a95..e78d5ef 100644 --- a/utils/data_preparation.py +++ b/utils/data_preparation.py @@ -13,6 +13,7 @@ import numpy as np from utils.functions import get_surface_normal_from_depth from utils.constants import DILATION_KERNEL +from scipy.interpolate import NearestNDInterpolator def chromatic_transform(image): @@ -147,7 +148,25 @@ def exr_loader(exr_path, ndim = 3, ndim_representation = ['R', 'G', 'B']): return exr_arr -def process_depth(depth, camera_type = 0, depth_min = 0.3, depth_max = 1.5, depth_norm = 1.0): +def depth_inpainting(depth): + mask = np.where(depth > 0) + if mask[0].shape[0] != 0: + interp = NearestNDInterpolator(np.transpose(mask), depth[mask]) + depth = interp(*np.indices(depth.shape)) + return depth + +def process_depth( + depth, + camera_type = 0, + depth_min = 0.3, + depth_max = 1.5, + depth_norm = 1.0, + depth_mu = None, + depth_std = None, + depth_coeff = 10.0, + return_mu_std = False, + inpainting = False +): """ Process the depth information, including scaling, normalization and clear NaN values. @@ -163,7 +182,17 @@ def process_depth(depth, camera_type = 0, depth_min = 0.3, depth_max = 1.5, dept depth_min, depth_max: int, optional, default: 0.3, 1.5, the min depth and the max depth; - depth_norm: float, optional, default: 1.0, the depth normalization coefficient. + depth_norm: float, optional, default: 1.0, the depth normalization coefficient; + + depth_mu, depth_std: float, optional, default: None, specify the mu and std of depth, set None to use automatic detection; + + depth_coeff: float, optional, default: 10.0, set the depth out of range [avg - coeff * std, avg + coeff * std] to 0 (unavailable); + + return_depth_mu_std: optional, default: False, whether to return the mu and std of depth specified before; + + inpainting: bool, optional, default: False, whether to inpaint the missing pixels; + + return_mu_std: Returns ------- @@ -179,7 +208,18 @@ def process_depth(depth, camera_type = 0, depth_min = 0.3, depth_max = 1.5, dept depth[np.isnan(depth)] = 0.0 depth = np.where(depth < depth_min, 0, depth) depth = np.where(depth > depth_max, 0, depth) + depth_available = depth[depth > 0] + if depth_mu is None: + depth_mu = depth_available.mean() if depth_available.shape[0] != 0 else 0 + if depth_std is None: + depth_std = depth_available.std() if depth_available.shape[0] != 0 else 1 + depth = np.where(depth < depth_mu - depth_coeff * depth_std, 0, depth) + depth = np.where(depth > depth_mu + depth_coeff * depth_std, 0, depth) + if inpainting: + depth = depth_inpainting(depth) depth = depth / depth_norm + if return_mu_std: + return depth, depth_mu, depth_std return depth @@ -198,6 +238,8 @@ def process_data( depth_norm = 10, use_aug = True, rgb_aug_prob = 0.8, + depth_coeff = 10.0, + inpainting = False, with_original = False, **kwargs): """ @@ -235,6 +277,10 @@ def process_data( rgb_aug_prob: float, optional, default: 0.8, the rgb augmentation probability (only applies when use_aug is set to True); + depth_coeff: float, optional, default: 10.0, set the depth out of range [avg - coeff * std, avg + coeff * std] to 0 (unavailable); + + inpainting: bool, optional, default: False, whether to inpaint the missing pixels; + with_original: bool, optional, default: False, whether to return original images. Returns @@ -255,8 +301,8 @@ def process_data( depth_gt_mask = depth_gt_mask.astype(np.bool) # depth processing - depth = process_depth(depth, camera_type = camera_type, depth_min = depth_min, depth_max = depth_max, depth_norm = depth_norm) - depth_gt = process_depth(depth_gt, camera_type = camera_type, depth_min = depth_min, depth_max = depth_max, depth_norm = depth_norm) + depth, d_mu, d_std = process_depth(depth, camera_type = camera_type, depth_min = depth_min, depth_max = depth_max, depth_norm = depth_norm, depth_mu = None, depth_std = None, depth_coeff = depth_coeff, return_mu_std = True, inpainting = inpainting) + depth_gt = process_depth(depth_gt, camera_type = camera_type, depth_min = depth_min, depth_max = depth_max, depth_norm = depth_norm, depth_mu = d_mu, depth_std = d_std, depth_coeff = depth_coeff, return_mu_std = False, inpainting = False) # RGB augmentation. if split == 'train' and use_aug and np.random.rand(1) > 1 - rgb_aug_prob: @@ -299,9 +345,15 @@ def process_data( zero_mask = np.logical_not(neg_zero_mask) zero_mask_dilated = np.logical_not(neg_zero_mask_dilated) + # inpainting depth now + depth_gt = depth_inpainting(depth_gt) + # loss mask initial_loss_mask = np.logical_and(depth_gt_mask, zero_mask) initial_loss_mask_dilated = np.logical_and(depth_gt_mask, zero_mask_dilated) + + loss_mask = initial_loss_mask + loss_mask_dilated = initial_loss_mask_dilated if scene_mask: loss_mask = initial_loss_mask loss_mask_dilated = initial_loss_mask_dilated @@ -309,9 +361,16 @@ def process_data( loss_mask = zero_mask loss_mask_dilated = zero_mask_dilated + # Normalization + depth_min = depth.min() - 0.5 * depth.std() - 1e-6 + depth_max = depth.max() + 0.5 * depth.std() + 1e-6 + depth = (depth - depth_min) / (depth_max - depth_min) + data_dict = { 'rgb': torch.FloatTensor(rgb), 'depth': torch.FloatTensor(depth), + 'depth_min': torch.tensor(depth_min), + 'depth_max': torch.tensor(depth_max), 'depth_gt': torch.FloatTensor(depth_gt), 'depth_gt_mask': torch.BoolTensor(depth_gt_mask), 'scene_mask': torch.tensor(scene_mask), diff --git a/utils/functions.py b/utils/functions.py index 50df5c7..a921223 100644 --- a/utils/functions.py +++ b/utils/functions.py @@ -3,7 +3,6 @@ Authors: Hongjie Fang. """ -from email.policy import default import torch import einops import numpy as np