TY - JOUR
T1 - ToF and Stereo Data Fusion Using Dynamic Search Range Stereo Matching
AU - Deng, Yong
AU - Xiao, Jimin
AU - Zhou, Steven Zhiying
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2022
Y1 - 2022
N2 - Time-of-Flight (ToF) sensors and stereo vision systems are both widely used for capturing depth data. They have some complementary strengths and limitations, which have been exploited in prior research to produce more accurate depth maps by fusing data from the two sources. However, among these diverse data fusion approaches, none of them provides an end-to-end neural network solution. In this work, we propose the first end-to-end ToF and stereo data fusion network using the coarse-to-fine matching framework, where the prior of ToF depth is integrated into the stereo matching process by constraining the search range of stereo matching within an interval around the ToF camera depth measurement. We adopt a dynamic search range for each pixel according to an estimated ToF error map, which is more efficient and effective than a constant one when handling various errors. The ToF error map is estimated by the ToF error estimator branching out from the stereo matching network. Both ToF error estimation and stereo matching are performed in a joint framework, with the two tasks assisting each other mutually. We also propose an upsampling module to replace the naive bilinear upsampling in the coarse-to-fine stereo matching network, which reduces the error caused by the upsampling. The proposed deep network is trained end-to-end on synthetic datasets and generalizable to real-world datasets without further fine-tuning. Experimental results show that our fusion method achieves higher accuracy than either ToF or stereo alone, and outperforms state-of-the-art fusion methods on both synthetic and real data.
AB - Time-of-Flight (ToF) sensors and stereo vision systems are both widely used for capturing depth data. They have some complementary strengths and limitations, which have been exploited in prior research to produce more accurate depth maps by fusing data from the two sources. However, among these diverse data fusion approaches, none of them provides an end-to-end neural network solution. In this work, we propose the first end-to-end ToF and stereo data fusion network using the coarse-to-fine matching framework, where the prior of ToF depth is integrated into the stereo matching process by constraining the search range of stereo matching within an interval around the ToF camera depth measurement. We adopt a dynamic search range for each pixel according to an estimated ToF error map, which is more efficient and effective than a constant one when handling various errors. The ToF error map is estimated by the ToF error estimator branching out from the stereo matching network. Both ToF error estimation and stereo matching are performed in a joint framework, with the two tasks assisting each other mutually. We also propose an upsampling module to replace the naive bilinear upsampling in the coarse-to-fine stereo matching network, which reduces the error caused by the upsampling. The proposed deep network is trained end-to-end on synthetic datasets and generalizable to real-world datasets without further fine-tuning. Experimental results show that our fusion method achieves higher accuracy than either ToF or stereo alone, and outperforms state-of-the-art fusion methods on both synthetic and real data.
KW - Time-of-flight
KW - fusion
KW - neural network
KW - stereo matching
UR - http://www.scopus.com/inward/record.url?scp=85111062112&partnerID=8YFLogxK
U2 - 10.1109/TMM.2021.3087017
DO - 10.1109/TMM.2021.3087017
M3 - Article
AN - SCOPUS:85111062112
SN - 1520-9210
VL - 24
SP - 2739
EP - 2751
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -