A key requirement for leveraging supervised deep learning methods is the
availability of large, labeled datasets. Unfortunately, in the context of RGB-D
scene understanding, very little data is available -- current datasets cover a
small range of scene views and have limited semantic annotations. To address
this issue, we introduce ScanNet, an RGB-D video dataset containing 2.5M views
in 1513 scenes annotated with 3D camera poses, surface reconstructions, and
semantic segmentations. To collect this data, we designed an easy-to-use and
scalable RGB-D capture system that includes automated surface reconstruction
and crowdsourced semantic annotation. We show that using this data helps
achieve state-of-the-art performance on several 3D scene understanding tasks,
including 3D object classification, semantic voxel labeling, and CAD model
retrieval. The dataset is freely available at http://www.scan-net.org.
Description
[1702.04405] ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes
%0 Generic
%1 dai2017scannet
%A Dai, Angela
%A Chang, Angel X.
%A Savva, Manolis
%A Halber, Maciej
%A Funkhouser, Thomas
%A Nießner, Matthias
%D 2017
%K 2017 3D arxiv data dataset paper reconstruction research shape
%T ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes
%U http://arxiv.org/abs/1702.04405
%X A key requirement for leveraging supervised deep learning methods is the
availability of large, labeled datasets. Unfortunately, in the context of RGB-D
scene understanding, very little data is available -- current datasets cover a
small range of scene views and have limited semantic annotations. To address
this issue, we introduce ScanNet, an RGB-D video dataset containing 2.5M views
in 1513 scenes annotated with 3D camera poses, surface reconstructions, and
semantic segmentations. To collect this data, we designed an easy-to-use and
scalable RGB-D capture system that includes automated surface reconstruction
and crowdsourced semantic annotation. We show that using this data helps
achieve state-of-the-art performance on several 3D scene understanding tasks,
including 3D object classification, semantic voxel labeling, and CAD model
retrieval. The dataset is freely available at http://www.scan-net.org.
@misc{dai2017scannet,
abstract = {A key requirement for leveraging supervised deep learning methods is the
availability of large, labeled datasets. Unfortunately, in the context of RGB-D
scene understanding, very little data is available -- current datasets cover a
small range of scene views and have limited semantic annotations. To address
this issue, we introduce ScanNet, an RGB-D video dataset containing 2.5M views
in 1513 scenes annotated with 3D camera poses, surface reconstructions, and
semantic segmentations. To collect this data, we designed an easy-to-use and
scalable RGB-D capture system that includes automated surface reconstruction
and crowdsourced semantic annotation. We show that using this data helps
achieve state-of-the-art performance on several 3D scene understanding tasks,
including 3D object classification, semantic voxel labeling, and CAD model
retrieval. The dataset is freely available at http://www.scan-net.org.},
added-at = {2017-12-05T05:02:42.000+0100},
author = {Dai, Angela and Chang, Angel X. and Savva, Manolis and Halber, Maciej and Funkhouser, Thomas and Nießner, Matthias},
biburl = {https://www.bibsonomy.org/bibtex/29195fb474cf50ec367ae3b7c4ecf3108/achakraborty},
description = {[1702.04405] ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes},
interhash = {87f9761c1fe03a0e4fd5972ecb3387ed},
intrahash = {9195fb474cf50ec367ae3b7c4ecf3108},
keywords = {2017 3D arxiv data dataset paper reconstruction research shape},
note = {cite arxiv:1702.04405},
timestamp = {2017-12-05T05:03:13.000+0100},
title = {ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes},
url = {http://arxiv.org/abs/1702.04405},
year = 2017
}