Abstract
Our goal is to learn a deep network that, given a small number of images of
an object of a given category, reconstructs it in 3D. While several recent
works have obtained analogous results using synthetic data or assuming the
availability of 2D primitives such as keypoints, we are interested in working
with challenging real data and with no manual annotations. We thus focus on
learning a model from multiple views of a large collection of object instances.
We contribute with a new large dataset of object centric videos suitable for
training and benchmarking this class of models. We show that existing
techniques leveraging meshes, voxels, or implicit surfaces, which work well for
reconstructing isolated objects, fail on this challenging data. Finally, we
propose a new neural network design, called warp-conditioned ray embedding
(WCR), which significantly improves reconstruction while obtaining a detailed
implicit representation of the object surface and texture, also compensating
for the noise in the initial SfM reconstruction that bootstrapped the learning
process. Our evaluation demonstrates performance improvements over several deep
monocular reconstruction baselines on existing benchmarks and on our novel
dataset.
Users
Please
log in to take part in the discussion (add own reviews or comments).