TY - JOUR
T1 - PuzText
T2 - Self-Supervised Learning of Permuted Texture Representation for Multilingual Text Recognition
AU - Lu, Minjun
AU - Xu, Shugong
AU - Zhang, Xuefan
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2024
Y1 - 2024
N2 - In recent years, text recognition has made significant progress with the booming development of deep learning techniques including pre-training schemes. However, for texts in different languages, previous pre-training methods treat them separately. In this paper, we propose a unified self-supervised pre-training method, PuzText, for understanding of textures and strokes which are consistent across different languages. By reconstructing permuted patches of text images, PuzText forces the model to learn about positional relationships and fine-grained details from different parts of input text images. Furthermore, the method is capable of reconstructing never-before-seen images of human languages, demonstrating its powerful generalization and application prospects. Besides, instead of multiplexed recognition heads on different languages, we propose a global Character-Aware Gate (CAG) to learn characters utilized in different languages. Fused with implicit character-aware representations, a unified language model will construct semantic information for different languages within specific characters. Experiments on several public benchmarks show that our method significantly outperforms previous approaches on end-to-end multilingual text recognition.
AB - In recent years, text recognition has made significant progress with the booming development of deep learning techniques including pre-training schemes. However, for texts in different languages, previous pre-training methods treat them separately. In this paper, we propose a unified self-supervised pre-training method, PuzText, for understanding of textures and strokes which are consistent across different languages. By reconstructing permuted patches of text images, PuzText forces the model to learn about positional relationships and fine-grained details from different parts of input text images. Furthermore, the method is capable of reconstructing never-before-seen images of human languages, demonstrating its powerful generalization and application prospects. Besides, instead of multiplexed recognition heads on different languages, we propose a global Character-Aware Gate (CAG) to learn characters utilized in different languages. Fused with implicit character-aware representations, a unified language model will construct semantic information for different languages within specific characters. Experiments on several public benchmarks show that our method significantly outperforms previous approaches on end-to-end multilingual text recognition.
KW - Multilingual text recognition
KW - Script identification
KW - Self-supervised learning
KW - Transformer
UR - http://www.scopus.com/inward/record.url?scp=85211476687&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2024.3509678
DO - 10.1109/ACCESS.2024.3509678
M3 - Article
AN - SCOPUS:85211476687
SN - 2169-3536
JO - IEEE Access
JF - IEEE Access
ER -