STR Transformer: A Cross-domain Transformer for Scene Text Recognition

Xing Wu*, Bin Tang, Ming Zhao, Jianjia Wang, Yike Guo

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

9 Citations (Scopus)

Abstract

Scene text recognition is an indispensable part of computer vision, which aims to extract text information from an image. However, effective extraction of texts following spelling rules remains a challenge for scene text recognition. We propose a cross-domain Transformer, called STR Transformer (STRT), which can not only extract texts from an image but also correct characters effectively according to their spelling rules. Specifically, we propose a Spline Transformer to extract hierarchical features of images without the convolution layers, which has the flexibility to build models with various scales and has linear computational complexity with respect to image size. Furthermore, an iterative Text Transformer is designed to predict the probability distribution of current character in the character sequence, which can effectively reduce the impact of noise. Extensive experiments demonstrate that the proposed STRT outperforms state-of-the-art methods on various benchmark datasets of scene text recognition. The qualitative and quantitative analysis proves the effectiveness and efficiency of the proposed STRT method.

Original languageEnglish
Pages (from-to)3444-3458
Number of pages15
JournalApplied Intelligence
Volume53
Issue number3
DOIs
Publication statusPublished - Feb 2023
Externally publishedYes

Keywords

  • Cross-domain
  • Hierarchical feature
  • Scene text recognition
  • Transformer

Fingerprint

Dive into the research topics of 'STR Transformer: A Cross-domain Transformer for Scene Text Recognition'. Together they form a unique fingerprint.

Cite this