TY - GEN
T1 - Coarse-to-Fine Document Image Registration for Dewarping
AU - Zhang, Weiguang
AU - Wang, Qiufeng
AU - Huang, Kaizhu
AU - Gu, Xiaomeng
AU - Guo, Fengjun
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
PY - 2024
Y1 - 2024
N2 - Document dewarping has made great progress in recent years, however it usually requires huge document pairs with pixel-level annotation to learn a mapping function. Although photographed document images are easy to obtain, the pixel-level annotation between warped and flat images is time-consuming and almost impossible for large-scale datasets. To overcome this issue, we propose to register photographed documents with corresponding flat counterparts, obtaining the auto-annotation of pixel-level mapping labels. Due to the severe deformation in the real photographed documents, we introduce a coarse-to-fine registration pipeline to learn global-scale transformation and local details alignment respectively. In addition, the lack of registration labels motivates us to tailor a teacher-student dual branch under semi-supervised training, where the model is initialized on synthetic documents with labels. Furthermore, we contribute a large-scale dataset containing 12,500 triplets of synthetic-real-flat documents. Extensive experiments demonstrate the effectiveness of our proposed registration method. Specifically, trained by our registered pixel-level documents, the dewarping model can obtain comparable performance with SOTAs trained by almost 100× scale of samples, showing the high quality of our registration results. Our dataset and code are available at https://github.com/hanquansanren/DIRD.
AB - Document dewarping has made great progress in recent years, however it usually requires huge document pairs with pixel-level annotation to learn a mapping function. Although photographed document images are easy to obtain, the pixel-level annotation between warped and flat images is time-consuming and almost impossible for large-scale datasets. To overcome this issue, we propose to register photographed documents with corresponding flat counterparts, obtaining the auto-annotation of pixel-level mapping labels. Due to the severe deformation in the real photographed documents, we introduce a coarse-to-fine registration pipeline to learn global-scale transformation and local details alignment respectively. In addition, the lack of registration labels motivates us to tailor a teacher-student dual branch under semi-supervised training, where the model is initialized on synthetic documents with labels. Furthermore, we contribute a large-scale dataset containing 12,500 triplets of synthetic-real-flat documents. Extensive experiments demonstrate the effectiveness of our proposed registration method. Specifically, trained by our registered pixel-level documents, the dewarping model can obtain comparable performance with SOTAs trained by almost 100× scale of samples, showing the high quality of our registration results. Our dataset and code are available at https://github.com/hanquansanren/DIRD.
KW - Coarse-to-Fine
KW - Document Dewarping
KW - Document Registration
KW - Semi-supervised Learning
UR - http://www.scopus.com/inward/record.url?scp=85204635035&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-70546-5_20
DO - 10.1007/978-3-031-70546-5_20
M3 - Conference Proceeding
AN - SCOPUS:85204635035
SN - 9783031705458
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 343
EP - 358
BT - Document Analysis and Recognition - ICDAR 2024 - 18th International Conference, Proceedings
A2 - Barney Smith, Elisa H.
A2 - Liwicki, Marcus
A2 - Peng, Liangrui
PB - Springer Science and Business Media Deutschland GmbH
T2 - 18th International Conference on Document Analysis and Recognition, ICDAR 2024
Y2 - 30 August 2024 through 4 September 2024
ER -