We present 3DMV, a novel method for 3D semantic scene segmentation of RGB-D
scans in indoor environments using a joint 3D-multi-view prediction network. In
contrast to existing methods that either use geometry or RGB data as input for
this task, we combine both data modalities in a joint, end-to-end network
architecture. Rather than simply projecting color data into a volumetric grid
and operating solely in 3D -- which would result in insufficient detail -- we
first extract feature maps from associated RGB images. These features are then
mapped into the volumetric feature grid of a 3D network using a differentiable
backprojection layer. Since our target is 3D scanning scenarios with possibly
many frames, we use a multi-view pooling approach in order to handle a varying
number of RGB input views. This learned combination of RGB and geometric
features with our joint 2D-3D architecture achieves significantly better
results than existing baselines. For instance, our final result on the ScanNet
3D segmentation benchmark increases from 52.8\% to 75\% accuracy compared to
existing volumetric architectures.
Beschreibung
[1803.10409] 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation
%0 Generic
%1 dai2018joint
%A Dai, Angela
%A Nießner, Matthias
%D 2018
%K 2018 3D arxiv multi-view paper segmentation semantic stanford
%T 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation
%U http://arxiv.org/abs/1803.10409
%X We present 3DMV, a novel method for 3D semantic scene segmentation of RGB-D
scans in indoor environments using a joint 3D-multi-view prediction network. In
contrast to existing methods that either use geometry or RGB data as input for
this task, we combine both data modalities in a joint, end-to-end network
architecture. Rather than simply projecting color data into a volumetric grid
and operating solely in 3D -- which would result in insufficient detail -- we
first extract feature maps from associated RGB images. These features are then
mapped into the volumetric feature grid of a 3D network using a differentiable
backprojection layer. Since our target is 3D scanning scenarios with possibly
many frames, we use a multi-view pooling approach in order to handle a varying
number of RGB input views. This learned combination of RGB and geometric
features with our joint 2D-3D architecture achieves significantly better
results than existing baselines. For instance, our final result on the ScanNet
3D segmentation benchmark increases from 52.8\% to 75\% accuracy compared to
existing volumetric architectures.
@misc{dai2018joint,
abstract = {We present 3DMV, a novel method for 3D semantic scene segmentation of RGB-D
scans in indoor environments using a joint 3D-multi-view prediction network. In
contrast to existing methods that either use geometry or RGB data as input for
this task, we combine both data modalities in a joint, end-to-end network
architecture. Rather than simply projecting color data into a volumetric grid
and operating solely in 3D -- which would result in insufficient detail -- we
first extract feature maps from associated RGB images. These features are then
mapped into the volumetric feature grid of a 3D network using a differentiable
backprojection layer. Since our target is 3D scanning scenarios with possibly
many frames, we use a multi-view pooling approach in order to handle a varying
number of RGB input views. This learned combination of RGB and geometric
features with our joint 2D-3D architecture achieves significantly better
results than existing baselines. For instance, our final result on the ScanNet
3D segmentation benchmark increases from 52.8\% to 75\% accuracy compared to
existing volumetric architectures.},
added-at = {2018-08-02T12:16:17.000+0200},
author = {Dai, Angela and Nießner, Matthias},
biburl = {https://www.bibsonomy.org/bibtex/292d43774fd4ad862b7fab3d6ddf6f275/analyst},
description = {[1803.10409] 3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation},
interhash = {17d5e8517b1cfba25f4dd7d04d1bc437},
intrahash = {92d43774fd4ad862b7fab3d6ddf6f275},
keywords = {2018 3D arxiv multi-view paper segmentation semantic stanford},
note = {cite arxiv:1803.10409},
timestamp = {2018-08-02T12:16:30.000+0200},
title = {3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation},
url = {http://arxiv.org/abs/1803.10409},
year = 2018
}