A lightweight real-time stereo depth estimation network with dynamic upsampling modules

Yong Deng, Jimin Xiao, Steven Zhiying Zhou

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

Abstract

Deep learning based stereo matching networks achieve great success in the depth estimation from stereo image pairs. However, current state-of-the-art methods usually are computationally intensive, which prevents them from being applied in real-time scenarios or on mobile platforms with limited computational resources. In order to tackle this shortcoming, we propose a lightweight real-time stereo matching network for disparity estimation. Our network adopts the efficient hierarchical Coarse-To-Fine (CTF) matching scheme, which starts matching from the low-resolution feature maps, and then upsamples and refines the previous disparity stage by stage until the full resolution. We can take the result of any stage as output to trade off accuracy and runtime. We propose an efficient hourglass-shaped feature extractor based on the latest MobileNet V3 to extract multi-resolution feature maps from stereo image pairs. We also propose to replace the traditional upsampling method in the CTF matching scheme with the learning-based dynamic upsampling modules to avoid blurring effects caused by conventional upsampling methods. Our model can process 1242 × 375 resolution images with 35-68 FPS on a GeForce GTX 1660 GPU, and outperforms all competitive baselines with comparable runtime on the KITTI 2012/2015 datasets.

Original languageEnglish
Title of host publicationVISAPP
EditorsGiovanni Maria Farinella, Petia Radeva, Jose Braz, Kadi Bouatouch
PublisherSciTePress
Pages701-710
Number of pages10
ISBN (Electronic)9789897584886
Publication statusPublished - 2021
Event16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2021 - Virtual, Online
Duration: 8 Feb 202110 Feb 2021

Publication series

NameVISIGRAPP 2021 - Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications
Volume5

Conference

Conference16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2021
CityVirtual, Online
Period8/02/2110/02/21

Keywords

  • Deep learning
  • Depth estimation
  • Dynamic upsampling
  • Stereo matching

Cite this