PuzText: Self-Supervised Learning of Permuted Texture Representation for Multilingual Text Recognition

Minjun Lu, Shugong Xu*, Xuefan Zhang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

In recent years, text recognition has made significant progress with the booming development of deep learning techniques including pre-training schemes. However, for texts in different languages, previous pre-training methods treat them separately. In this paper, we propose a unified self-supervised pre-training method, PuzText, for understanding of textures and strokes which are consistent across different languages. By reconstructing permuted patches of text images, PuzText forces the model to learn about positional relationships and fine-grained details from different parts of input text images. Furthermore, the method is capable of reconstructing never-before-seen images of human languages, demonstrating its powerful generalization and application prospects. Besides, instead of multiplexed recognition heads on different languages, we propose a global Character-Aware Gate (CAG) to learn characters utilized in different languages. Fused with implicit character-aware representations, a unified language model will construct semantic information for different languages within specific characters. Experiments on several public benchmarks show that our method significantly outperforms previous approaches on end-to-end multilingual text recognition.

Original languageEnglish
JournalIEEE Access
DOIs
Publication statusAccepted/In press - 2024
Externally publishedYes

Keywords

  • Multilingual text recognition
  • Script identification
  • Self-supervised learning
  • Transformer

Fingerprint

Dive into the research topics of 'PuzText: Self-Supervised Learning of Permuted Texture Representation for Multilingual Text Recognition'. Together they form a unique fingerprint.

Cite this