TY - GEN
T1 - A Segment-Based Layout Aware Model for Information Extraction on Document Images
AU - Ning, Maizhen
AU - Wang, Qiu Feng
AU - Huang, Kaizhu
AU - Huang, Xiaowei
N1 - Funding Information:
Acknowledgments. The work was partially supported by the following: National Natural Science Foundation of China under no. 61876154 and no. 61876155; Jiangsu Science and Technology Programme (Natural Science Foundation of Jiangsu Province) under no. BE2020006-4 and BK20181190; Key Program Special Fund in XJTLU under no. KSF-T-06, KSF-E-26, and KSF-A-10, and XJTLU Research Development Fund RDF-16-02-49.
Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - Information extraction (IE) on document images has attracted considerable attention recently due to its great potentials for intelligent document analysis, where visual layout information is vital. However, most existing works mainly consider visual layout information at the token level, which unfortunately ignore long contexts and require time-consuming annotation. In this paper, we propose to model document visual layout information at the segment level. First, we obtain segment representation by integrating the segment-level layout information and text embedding. Since only segment-level layout annotation is needed, our model enjoys a low cost in comparison with the full annotation as needed at the token level. Then, word vectors are also extracted from each text segment to get the fine-grained representation. Finally, both segment and word vectors are fused for obtaining prediction results. Extensive experiments on the benchmark datasets are conducted to demonstrate the effectiveness of our novel method.
AB - Information extraction (IE) on document images has attracted considerable attention recently due to its great potentials for intelligent document analysis, where visual layout information is vital. However, most existing works mainly consider visual layout information at the token level, which unfortunately ignore long contexts and require time-consuming annotation. In this paper, we propose to model document visual layout information at the segment level. First, we obtain segment representation by integrating the segment-level layout information and text embedding. Since only segment-level layout annotation is needed, our model enjoys a low cost in comparison with the full annotation as needed at the token level. Then, word vectors are also extracted from each text segment to get the fine-grained representation. Finally, both segment and word vectors are fused for obtaining prediction results. Extensive experiments on the benchmark datasets are conducted to demonstrate the effectiveness of our novel method.
KW - Document intelligence
KW - Information extraction
KW - Segment representation
KW - Visual layout information
KW - Weak annotation
UR - http://www.scopus.com/inward/record.url?scp=85121915498&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-92307-5_88
DO - 10.1007/978-3-030-92307-5_88
M3 - Conference Proceeding
AN - SCOPUS:85121915498
SN - 9783030923068
T3 - Communications in Computer and Information Science
SP - 757
EP - 765
BT - Neural Information Processing - 28th International Conference, ICONIP 2021, Proceedings
A2 - Mantoro, Teddy
A2 - Lee, Minho
A2 - Ayu, Media Anugerah
A2 - Wong, Kok Wai
A2 - Hidayanto, Achmad Nizar
PB - Springer Science and Business Media Deutschland GmbH
T2 - 28th International Conference on Neural Information Processing, ICONIP 2021
Y2 - 8 December 2021 through 12 December 2021
ER -