Robust Semi-Supervised Monocular Depth Estimation with Reprojected Distances
V. Guizilini, J. Li, R. Ambrus, S. Pillai, and A. Gaidon. Proceedings of the Conference on Robot Learning, volume 100 of Proceedings of Machine Learning Research, page 503--512. PMLR, (30 Oct--01 Nov 2020)
Abstract
Dense depth estimation from a single image is a key problem in computer vision, with exciting applications in a multitude of robotic tasks. Initially viewed as a direct regression problem, requiring annotated labels as supervision at training time, in the past few years a substantial amount of work has been done in self-supervised depth training based on strong geometric cues, both from stereo cameras and more recently from monocular video sequences. In this paper we investigate how these two approaches (supervised & self-supervised) can be effectively combined, so that a depth model can learn to encode true scale from sparse supervision while achieving high fidelity local accuracy by leveraging geometric cues. To this end, we propose a novel supervised loss term that complements the widely used photometric loss, and show how it can be used to train robust semi-supervised monocular depth estimation models. Furthermore, we evaluate how much supervision is actually necessary to train accurate scale-aware monocular depth models, showing that with our proposed framework, very sparse LiDAR information, with as few as 4 beams (less than 100 valid depth values per image), is enough to achieve results competitive with the current state-of-the-art.
%0 Conference Paper
%1 2020-guizilini3
%A Guizilini, Vitor
%A Li, Jie
%A Ambrus, Rares
%A Pillai, Sudeep
%A Gaidon, Adrien
%B Proceedings of the Conference on Robot Learning
%D 2020
%E Kaelbling, Leslie Pack
%E Kragic, Danica
%E Sugiura, Komei
%I PMLR
%K depth estimation monocular robust semi-supervised
%P 503--512
%T Robust Semi-Supervised Monocular Depth Estimation with Reprojected Distances
%U http://proceedings.mlr.press/v100/guizilini20a.html
%V 100
%X Dense depth estimation from a single image is a key problem in computer vision, with exciting applications in a multitude of robotic tasks. Initially viewed as a direct regression problem, requiring annotated labels as supervision at training time, in the past few years a substantial amount of work has been done in self-supervised depth training based on strong geometric cues, both from stereo cameras and more recently from monocular video sequences. In this paper we investigate how these two approaches (supervised & self-supervised) can be effectively combined, so that a depth model can learn to encode true scale from sparse supervision while achieving high fidelity local accuracy by leveraging geometric cues. To this end, we propose a novel supervised loss term that complements the widely used photometric loss, and show how it can be used to train robust semi-supervised monocular depth estimation models. Furthermore, we evaluate how much supervision is actually necessary to train accurate scale-aware monocular depth models, showing that with our proposed framework, very sparse LiDAR information, with as few as 4 beams (less than 100 valid depth values per image), is enough to achieve results competitive with the current state-of-the-art.
@inproceedings{2020-guizilini3,
abstract = {Dense depth estimation from a single image is a key problem in computer vision, with exciting applications in a multitude of robotic tasks. Initially viewed as a direct regression problem, requiring annotated labels as supervision at training time, in the past few years a substantial amount of work has been done in self-supervised depth training based on strong geometric cues, both from stereo cameras and more recently from monocular video sequences. In this paper we investigate how these two approaches (supervised & self-supervised) can be effectively combined, so that a depth model can learn to encode true scale from sparse supervision while achieving high fidelity local accuracy by leveraging geometric cues. To this end, we propose a novel supervised loss term that complements the widely used photometric loss, and show how it can be used to train robust semi-supervised monocular depth estimation models. Furthermore, we evaluate how much supervision is actually necessary to train accurate scale-aware monocular depth models, showing that with our proposed framework, very sparse LiDAR information, with as few as 4 beams (less than 100 valid depth values per image), is enough to achieve results competitive with the current state-of-the-art.},
added-at = {2021-07-07T13:21:42.000+0200},
author = {Guizilini, Vitor and Li, Jie and Ambrus, Rares and Pillai, Sudeep and Gaidon, Adrien},
biburl = {https://www.bibsonomy.org/bibtex/25cbc820328b4a394085087c9be8c3f22/pkoch},
booktitle = {Proceedings of the Conference on Robot Learning},
editor = {Kaelbling, Leslie Pack and Kragic, Danica and Sugiura, Komei},
interhash = {56bd2ef0538eca28e91e88819eccff0c},
intrahash = {5cbc820328b4a394085087c9be8c3f22},
keywords = {depth estimation monocular robust semi-supervised},
month = {30 Oct--01 Nov},
pages = {503--512},
pdf = {http://proceedings.mlr.press/v100/guizilini20a/guizilini20a.pdf},
publisher = {PMLR},
series = {Proceedings of Machine Learning Research},
timestamp = {2021-07-07T13:21:42.000+0200},
title = {Robust Semi-Supervised Monocular Depth Estimation with Reprojected Distances},
url = {http://proceedings.mlr.press/v100/guizilini20a.html},
volume = 100,
year = 2020
}