A lightweight real-time stereo depth estimation network with dynamic upsampling modules

Yong Deng; Jimin Xiao; Steven Zhiying Zhou

A lightweight real-time stereo depth estimation network with dynamic upsampling modules

Yong Deng, Jimin Xiao, Steven Zhiying Zhou

Department of Intelligent Science

National University of Singapore

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

Abstract

Deep learning based stereo matching networks achieve great success in the depth estimation from stereo image pairs. However, current state-of-the-art methods usually are computationally intensive, which prevents them from being applied in real-time scenarios or on mobile platforms with limited computational resources. In order to tackle this shortcoming, we propose a lightweight real-time stereo matching network for disparity estimation. Our network adopts the efficient hierarchical Coarse-To-Fine (CTF) matching scheme, which starts matching from the low-resolution feature maps, and then upsamples and refines the previous disparity stage by stage until the full resolution. We can take the result of any stage as output to trade off accuracy and runtime. We propose an efficient hourglass-shaped feature extractor based on the latest MobileNet V3 to extract multi-resolution feature maps from stereo image pairs. We also propose to replace the traditional upsampling method in the CTF matching scheme with the learning-based dynamic upsampling modules to avoid blurring effects caused by conventional upsampling methods. Our model can process 1242 × 375 resolution images with 35-68 FPS on a GeForce GTX 1660 GPU, and outperforms all competitive baselines with comparable runtime on the KITTI 2012/2015 datasets.

Original language	English
Title of host publication	VISAPP
Editors	Giovanni Maria Farinella, Petia Radeva, Jose Braz, Kadi Bouatouch
Publisher	SciTePress
Pages	701-710
Number of pages	10
ISBN (Electronic)	9789897584886
Publication status	Published - 2021
Event	16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2021 - Virtual, Online Duration: 8 Feb 2021 → 10 Feb 2021

Publication series

Name	VISIGRAPP 2021 - Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications
Volume	5

Conference

Conference	16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2021
City	Virtual, Online
Period	8/02/21 → 10/02/21

Keywords

Deep learning
Depth estimation
Dynamic upsampling
Stereo matching

Cite this

Deng, Y., Xiao, J., & Zhou, S. Z. (2021). A lightweight real-time stereo depth estimation network with dynamic upsampling modules. In G. M. Farinella, P. Radeva, J. Braz, & K. Bouatouch (Eds.), VISAPP (pp. 701-710). (VISIGRAPP 2021 - Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications; Vol. 5). SciTePress.

Deng, Yong ; Xiao, Jimin ; Zhou, Steven Zhiying. / A lightweight real-time stereo depth estimation network with dynamic upsampling modules. VISAPP. editor / Giovanni Maria Farinella ; Petia Radeva ; Jose Braz ; Kadi Bouatouch. SciTePress, 2021. pp. 701-710 (VISIGRAPP 2021 - Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications).

@inproceedings{17951006859c44299b3cdae85591a4fa,

title = "A lightweight real-time stereo depth estimation network with dynamic upsampling modules",

abstract = "Deep learning based stereo matching networks achieve great success in the depth estimation from stereo image pairs. However, current state-of-the-art methods usually are computationally intensive, which prevents them from being applied in real-time scenarios or on mobile platforms with limited computational resources. In order to tackle this shortcoming, we propose a lightweight real-time stereo matching network for disparity estimation. Our network adopts the efficient hierarchical Coarse-To-Fine (CTF) matching scheme, which starts matching from the low-resolution feature maps, and then upsamples and refines the previous disparity stage by stage until the full resolution. We can take the result of any stage as output to trade off accuracy and runtime. We propose an efficient hourglass-shaped feature extractor based on the latest MobileNet V3 to extract multi-resolution feature maps from stereo image pairs. We also propose to replace the traditional upsampling method in the CTF matching scheme with the learning-based dynamic upsampling modules to avoid blurring effects caused by conventional upsampling methods. Our model can process 1242 × 375 resolution images with 35-68 FPS on a GeForce GTX 1660 GPU, and outperforms all competitive baselines with comparable runtime on the KITTI 2012/2015 datasets.",

keywords = "Deep learning, Depth estimation, Dynamic upsampling, Stereo matching",

author = "Yong Deng and Jimin Xiao and Zhou, {Steven Zhiying}",

note = "Publisher Copyright: Copyright {\textcopyright} 2021 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved.; 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2021 ; Conference date: 08-02-2021 Through 10-02-2021",

year = "2021",

language = "English",

series = "VISIGRAPP 2021 - Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications",

publisher = "SciTePress",

pages = "701--710",

editor = "Farinella, {Giovanni Maria} and Petia Radeva and Jose Braz and Kadi Bouatouch",

booktitle = "VISAPP",

}

Deng, Y, Xiao, J & Zhou, SZ 2021, A lightweight real-time stereo depth estimation network with dynamic upsampling modules. in GM Farinella, P Radeva, J Braz & K Bouatouch (eds), VISAPP. VISIGRAPP 2021 - Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, vol. 5, SciTePress, pp. 701-710, 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2021, Virtual, Online, 8/02/21.

A lightweight real-time stereo depth estimation network with dynamic upsampling modules. / Deng, Yong; Xiao, Jimin; Zhou, Steven Zhiying.
VISAPP. ed. / Giovanni Maria Farinella; Petia Radeva; Jose Braz; Kadi Bouatouch. SciTePress, 2021. p. 701-710 (VISIGRAPP 2021 - Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications; Vol. 5).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - A lightweight real-time stereo depth estimation network with dynamic upsampling modules

AU - Deng, Yong

AU - Xiao, Jimin

AU - Zhou, Steven Zhiying

PY - 2021

Y1 - 2021

N2 - Deep learning based stereo matching networks achieve great success in the depth estimation from stereo image pairs. However, current state-of-the-art methods usually are computationally intensive, which prevents them from being applied in real-time scenarios or on mobile platforms with limited computational resources. In order to tackle this shortcoming, we propose a lightweight real-time stereo matching network for disparity estimation. Our network adopts the efficient hierarchical Coarse-To-Fine (CTF) matching scheme, which starts matching from the low-resolution feature maps, and then upsamples and refines the previous disparity stage by stage until the full resolution. We can take the result of any stage as output to trade off accuracy and runtime. We propose an efficient hourglass-shaped feature extractor based on the latest MobileNet V3 to extract multi-resolution feature maps from stereo image pairs. We also propose to replace the traditional upsampling method in the CTF matching scheme with the learning-based dynamic upsampling modules to avoid blurring effects caused by conventional upsampling methods. Our model can process 1242 × 375 resolution images with 35-68 FPS on a GeForce GTX 1660 GPU, and outperforms all competitive baselines with comparable runtime on the KITTI 2012/2015 datasets.

AB - Deep learning based stereo matching networks achieve great success in the depth estimation from stereo image pairs. However, current state-of-the-art methods usually are computationally intensive, which prevents them from being applied in real-time scenarios or on mobile platforms with limited computational resources. In order to tackle this shortcoming, we propose a lightweight real-time stereo matching network for disparity estimation. Our network adopts the efficient hierarchical Coarse-To-Fine (CTF) matching scheme, which starts matching from the low-resolution feature maps, and then upsamples and refines the previous disparity stage by stage until the full resolution. We can take the result of any stage as output to trade off accuracy and runtime. We propose an efficient hourglass-shaped feature extractor based on the latest MobileNet V3 to extract multi-resolution feature maps from stereo image pairs. We also propose to replace the traditional upsampling method in the CTF matching scheme with the learning-based dynamic upsampling modules to avoid blurring effects caused by conventional upsampling methods. Our model can process 1242 × 375 resolution images with 35-68 FPS on a GeForce GTX 1660 GPU, and outperforms all competitive baselines with comparable runtime on the KITTI 2012/2015 datasets.

KW - Deep learning

KW - Depth estimation

KW - Dynamic upsampling

KW - Stereo matching

UR - http://www.scopus.com/inward/record.url?scp=85102976302&partnerID=8YFLogxK

M3 - Conference Proceeding

AN - SCOPUS:85102976302

T3 - VISIGRAPP 2021 - Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications

SP - 701

EP - 710

BT - VISAPP

A2 - Farinella, Giovanni Maria

A2 - Radeva, Petia

A2 - Braz, Jose

A2 - Bouatouch, Kadi

PB - SciTePress

T2 - 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2021

Y2 - 8 February 2021 through 10 February 2021

ER -

A lightweight real-time stereo depth estimation network with dynamic upsampling modules

Abstract

Publication series

Conference

Keywords

Other files and links

Cite this