Abstract
Most existing algorithms for depth estimation from single monocular images need large quantities of metric ground-truth depths for supervised learning. We show that relative depth can be an informative cue for metric depth estimation and can be easily obtained from vast stereo videos. Acquiring metric depths from stereo videos are sometimes impracticable due to the absence of camera parameters. In this paper, we propose to improve the performance of metric depth estimation with relative depths collected from stereo movie videos using existing stereo matching algorithm. We introduce a new 'relative depth in stereo' (RDIS) dataset densely labeled with relative depths. We first pretrain a ResNet model on our RDIS dataset. Then, we finetune the model on RGB-D datasets with metric ground-truth depths. During our finetuning, we formulate depth estimation as a classification task. This re-formulation scheme enables us to obtain the confidence of a depth prediction in the form of probability distribution. With this confidence, we propose an information gain loss to make use of the predictions that are close to ground-truth. We evaluate our approach on both indoor and outdoor benchmark RGB-D datasets and achieve the state-of-the-art performance.
Original language | English |
---|---|
Article number | 8764412 |
Pages (from-to) | 2674-2682 |
Number of pages | 9 |
Journal | IEEE Transactions on Circuits and Systems for Video Technology |
Volume | 30 |
Issue number | 8 |
DOIs | |
Publication status | Published - Aug 2020 |
Externally published | Yes |
Keywords
- deep network
- Depth estimation
- ordinal relationship
- RGB-D dataset