Monocular Depth Estimation with Augmented Ordinal Depth Relationships

Yuanzhouhan Cao; Tianqi Zhao; Ke Xian; Chunhua Shen; Zhiguo Cao; Shugong Xu

doi:10.1109/TCSVT.2019.2929202

Monocular Depth Estimation with Augmented Ordinal Depth Relationships

Yuanzhouhan Cao, Tianqi Zhao, Ke Xian, Chunhua Shen, Zhiguo Cao, Shugong Xu

Research output: Contribution to journal › Article › peer-review

41 Citations (Scopus)

Abstract

Most existing algorithms for depth estimation from single monocular images need large quantities of metric ground-truth depths for supervised learning. We show that relative depth can be an informative cue for metric depth estimation and can be easily obtained from vast stereo videos. Acquiring metric depths from stereo videos are sometimes impracticable due to the absence of camera parameters. In this paper, we propose to improve the performance of metric depth estimation with relative depths collected from stereo movie videos using existing stereo matching algorithm. We introduce a new 'relative depth in stereo' (RDIS) dataset densely labeled with relative depths. We first pretrain a ResNet model on our RDIS dataset. Then, we finetune the model on RGB-D datasets with metric ground-truth depths. During our finetuning, we formulate depth estimation as a classification task. This re-formulation scheme enables us to obtain the confidence of a depth prediction in the form of probability distribution. With this confidence, we propose an information gain loss to make use of the predictions that are close to ground-truth. We evaluate our approach on both indoor and outdoor benchmark RGB-D datasets and achieve the state-of-the-art performance.

Original language	English
Article number	8764412
Pages (from-to)	2674-2682
Number of pages	9
Journal	IEEE Transactions on Circuits and Systems for Video Technology
Volume	30
Issue number	8
DOIs	https://doi.org/10.1109/TCSVT.2019.2929202
Publication status	Published - Aug 2020
Externally published	Yes

Keywords

Depth estimation
RGB-D dataset
deep network
ordinal relationship

Access to Document

10.1109/TCSVT.2019.2929202

Cite this

@article{14c5aeb70f894ae193781fe52c2b792f,

title = "Monocular Depth Estimation with Augmented Ordinal Depth Relationships",

abstract = "Most existing algorithms for depth estimation from single monocular images need large quantities of metric ground-truth depths for supervised learning. We show that relative depth can be an informative cue for metric depth estimation and can be easily obtained from vast stereo videos. Acquiring metric depths from stereo videos are sometimes impracticable due to the absence of camera parameters. In this paper, we propose to improve the performance of metric depth estimation with relative depths collected from stereo movie videos using existing stereo matching algorithm. We introduce a new 'relative depth in stereo' (RDIS) dataset densely labeled with relative depths. We first pretrain a ResNet model on our RDIS dataset. Then, we finetune the model on RGB-D datasets with metric ground-truth depths. During our finetuning, we formulate depth estimation as a classification task. This re-formulation scheme enables us to obtain the confidence of a depth prediction in the form of probability distribution. With this confidence, we propose an information gain loss to make use of the predictions that are close to ground-truth. We evaluate our approach on both indoor and outdoor benchmark RGB-D datasets and achieve the state-of-the-art performance.",

keywords = "Depth estimation, RGB-D dataset, deep network, ordinal relationship",

author = "Yuanzhouhan Cao and Tianqi Zhao and Ke Xian and Chunhua Shen and Zhiguo Cao and Shugong Xu",

note = "Publisher Copyright: {\textcopyright} 1991-2012 IEEE.",

year = "2020",

month = aug,

doi = "10.1109/TCSVT.2019.2929202",

language = "English",

volume = "30",

pages = "2674--2682",

journal = "IEEE Transactions on Circuits and Systems for Video Technology",

issn = "1051-8215",

number = "8",

}

TY - JOUR

T1 - Monocular Depth Estimation with Augmented Ordinal Depth Relationships

AU - Cao, Yuanzhouhan

AU - Zhao, Tianqi

AU - Xian, Ke

AU - Shen, Chunhua

AU - Cao, Zhiguo

AU - Xu, Shugong

PY - 2020/8

Y1 - 2020/8

N2 - Most existing algorithms for depth estimation from single monocular images need large quantities of metric ground-truth depths for supervised learning. We show that relative depth can be an informative cue for metric depth estimation and can be easily obtained from vast stereo videos. Acquiring metric depths from stereo videos are sometimes impracticable due to the absence of camera parameters. In this paper, we propose to improve the performance of metric depth estimation with relative depths collected from stereo movie videos using existing stereo matching algorithm. We introduce a new 'relative depth in stereo' (RDIS) dataset densely labeled with relative depths. We first pretrain a ResNet model on our RDIS dataset. Then, we finetune the model on RGB-D datasets with metric ground-truth depths. During our finetuning, we formulate depth estimation as a classification task. This re-formulation scheme enables us to obtain the confidence of a depth prediction in the form of probability distribution. With this confidence, we propose an information gain loss to make use of the predictions that are close to ground-truth. We evaluate our approach on both indoor and outdoor benchmark RGB-D datasets and achieve the state-of-the-art performance.

AB - Most existing algorithms for depth estimation from single monocular images need large quantities of metric ground-truth depths for supervised learning. We show that relative depth can be an informative cue for metric depth estimation and can be easily obtained from vast stereo videos. Acquiring metric depths from stereo videos are sometimes impracticable due to the absence of camera parameters. In this paper, we propose to improve the performance of metric depth estimation with relative depths collected from stereo movie videos using existing stereo matching algorithm. We introduce a new 'relative depth in stereo' (RDIS) dataset densely labeled with relative depths. We first pretrain a ResNet model on our RDIS dataset. Then, we finetune the model on RGB-D datasets with metric ground-truth depths. During our finetuning, we formulate depth estimation as a classification task. This re-formulation scheme enables us to obtain the confidence of a depth prediction in the form of probability distribution. With this confidence, we propose an information gain loss to make use of the predictions that are close to ground-truth. We evaluate our approach on both indoor and outdoor benchmark RGB-D datasets and achieve the state-of-the-art performance.

KW - Depth estimation

KW - RGB-D dataset

KW - deep network

KW - ordinal relationship

UR - http://www.scopus.com/inward/record.url?scp=85089502121&partnerID=8YFLogxK

U2 - 10.1109/TCSVT.2019.2929202

DO - 10.1109/TCSVT.2019.2929202

M3 - Article

AN - SCOPUS:85089502121

SN - 1051-8215

VL - 30

SP - 2674

EP - 2682

JO - IEEE Transactions on Circuits and Systems for Video Technology

JF - IEEE Transactions on Circuits and Systems for Video Technology

IS - 8

M1 - 8764412

ER -

Monocular Depth Estimation with Augmented Ordinal Depth Relationships

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this