TY - JOUR
T1 - Residual attention-based multi-scale script identification in scene text images
AU - Ma, Mengkai
AU - Wang, Qiu Feng
AU - Huang, Shan
AU - Huang, Shen
AU - Goulermas, Yannis
AU - Huang, Kaizhu
N1 - Funding Information:
This work is supported by National Natural Science Foundation of China (NSFC) under no. 61876154 and 61876155; Natural Science Foundation of Jiangsu Province under no. BK20181189, BK20181190, BE2020006-4; Key Program Special Fund in XJTLU under no. KSF-A-10, KSF-A-01, KSF-T-06, KSF-E-26, KSF-P-02; XJTLU Research Development Fund under no. RDF-16-02-49, and CCF-Tencent Open Research Fund under no. RAGR20180109.
Publisher Copyright:
© 2020 Elsevier B.V.
PY - 2021/1/15
Y1 - 2021/1/15
N2 - Script identification is an essential step in the text extraction pipeline for multi-lingual application. This paper presents an effective approach to identify scripts in scene text images. Due to the complicated background, various text styles, character similarity of different languages, script identification has not been solved yet. Under the general classification framework of script identification, we investigate two important components: feature extraction and classification layer. In the feature extraction, we utilize a hierarchical feature fusion block to extract the multi-scale features. Furthermore, we adopt an attention mechanism to obtain the local discriminative parts of feature maps. In the classification layer, we utilize a fully convolutional classifier to generate channel-level classifications which are then processed by a global pooling layer to improve classification efficiency. We evaluated the proposed approach on benchmark datasets of RRC-MLT2017, SIW-13, CVSI-2015 and MLe2e, and the experimental results show the effectiveness of each elaborate designed component. Finally, we achieve better performances than those competitive models, where the correct rates are 89.66%, 96.11%, 98.78% and 97.20% on PRC-MLT2017, SIW-13, CVSI-2015 and MLe2e, respectively.
AB - Script identification is an essential step in the text extraction pipeline for multi-lingual application. This paper presents an effective approach to identify scripts in scene text images. Due to the complicated background, various text styles, character similarity of different languages, script identification has not been solved yet. Under the general classification framework of script identification, we investigate two important components: feature extraction and classification layer. In the feature extraction, we utilize a hierarchical feature fusion block to extract the multi-scale features. Furthermore, we adopt an attention mechanism to obtain the local discriminative parts of feature maps. In the classification layer, we utilize a fully convolutional classifier to generate channel-level classifications which are then processed by a global pooling layer to improve classification efficiency. We evaluated the proposed approach on benchmark datasets of RRC-MLT2017, SIW-13, CVSI-2015 and MLe2e, and the experimental results show the effectiveness of each elaborate designed component. Finally, we achieve better performances than those competitive models, where the correct rates are 89.66%, 96.11%, 98.78% and 97.20% on PRC-MLT2017, SIW-13, CVSI-2015 and MLe2e, respectively.
KW - Attention mechanism
KW - Feature fusion
KW - Global max pooling
KW - Multi-scale features
KW - Script identification
UR - http://www.scopus.com/inward/record.url?scp=85092722048&partnerID=8YFLogxK
U2 - 10.1016/j.neucom.2020.09.015
DO - 10.1016/j.neucom.2020.09.015
M3 - Article
AN - SCOPUS:85092722048
SN - 0925-2312
VL - 421
SP - 222
EP - 233
JO - Neurocomputing
JF - Neurocomputing
ER -