TY - JOUR
T1 - STR Transformer
T2 - A Cross-domain Transformer for Scene Text Recognition
AU - Wu, Xing
AU - Tang, Bin
AU - Zhao, Ming
AU - Wang, Jianjia
AU - Guo, Yike
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2023/2
Y1 - 2023/2
N2 - Scene text recognition is an indispensable part of computer vision, which aims to extract text information from an image. However, effective extraction of texts following spelling rules remains a challenge for scene text recognition. We propose a cross-domain Transformer, called STR Transformer (STRT), which can not only extract texts from an image but also correct characters effectively according to their spelling rules. Specifically, we propose a Spline Transformer to extract hierarchical features of images without the convolution layers, which has the flexibility to build models with various scales and has linear computational complexity with respect to image size. Furthermore, an iterative Text Transformer is designed to predict the probability distribution of current character in the character sequence, which can effectively reduce the impact of noise. Extensive experiments demonstrate that the proposed STRT outperforms state-of-the-art methods on various benchmark datasets of scene text recognition. The qualitative and quantitative analysis proves the effectiveness and efficiency of the proposed STRT method.
AB - Scene text recognition is an indispensable part of computer vision, which aims to extract text information from an image. However, effective extraction of texts following spelling rules remains a challenge for scene text recognition. We propose a cross-domain Transformer, called STR Transformer (STRT), which can not only extract texts from an image but also correct characters effectively according to their spelling rules. Specifically, we propose a Spline Transformer to extract hierarchical features of images without the convolution layers, which has the flexibility to build models with various scales and has linear computational complexity with respect to image size. Furthermore, an iterative Text Transformer is designed to predict the probability distribution of current character in the character sequence, which can effectively reduce the impact of noise. Extensive experiments demonstrate that the proposed STRT outperforms state-of-the-art methods on various benchmark datasets of scene text recognition. The qualitative and quantitative analysis proves the effectiveness and efficiency of the proposed STRT method.
KW - Cross-domain
KW - Hierarchical feature
KW - Scene text recognition
KW - Transformer
UR - http://www.scopus.com/inward/record.url?scp=85131092410&partnerID=8YFLogxK
U2 - 10.1007/s10489-022-03728-5
DO - 10.1007/s10489-022-03728-5
M3 - Article
AN - SCOPUS:85131092410
SN - 0924-669X
VL - 53
SP - 3444
EP - 3458
JO - Applied Intelligence
JF - Applied Intelligence
IS - 3
ER -