Residual attention-based multi-scale script identification in scene text images

Mengkai Ma; Qiu Feng Wang; Shan Huang; Shen Huang; Yannis Goulermas; Kaizhu Huang

doi:10.1016/j.neucom.2020.09.015

Residual attention-based multi-scale script identification in scene text images

Mengkai Ma, Qiu Feng Wang^*, Shan Huang, Shen Huang, Yannis Goulermas, Kaizhu Huang

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

28 Citations (Scopus)

Abstract

Script identification is an essential step in the text extraction pipeline for multi-lingual application. This paper presents an effective approach to identify scripts in scene text images. Due to the complicated background, various text styles, character similarity of different languages, script identification has not been solved yet. Under the general classification framework of script identification, we investigate two important components: feature extraction and classification layer. In the feature extraction, we utilize a hierarchical feature fusion block to extract the multi-scale features. Furthermore, we adopt an attention mechanism to obtain the local discriminative parts of feature maps. In the classification layer, we utilize a fully convolutional classifier to generate channel-level classifications which are then processed by a global pooling layer to improve classification efficiency. We evaluated the proposed approach on benchmark datasets of RRC-MLT2017, SIW-13, CVSI-2015 and MLe2e, and the experimental results show the effectiveness of each elaborate designed component. Finally, we achieve better performances than those competitive models, where the correct rates are 89.66%, 96.11%, 98.78% and 97.20% on PRC-MLT2017, SIW-13, CVSI-2015 and MLe2e, respectively.

Original language	English
Pages (from-to)	222-233
Number of pages	12
Journal	Neurocomputing
Volume	421
DOIs	https://doi.org/10.1016/j.neucom.2020.09.015
Publication status	Published - 15 Jan 2021

Keywords

Attention mechanism
Feature fusion
Global max pooling
Multi-scale features
Script identification

Access to Document

10.1016/j.neucom.2020.09.015

Cite this

@article{6854f81e045e422e84f8c11291ad624a,

title = "Residual attention-based multi-scale script identification in scene text images",

abstract = "Script identification is an essential step in the text extraction pipeline for multi-lingual application. This paper presents an effective approach to identify scripts in scene text images. Due to the complicated background, various text styles, character similarity of different languages, script identification has not been solved yet. Under the general classification framework of script identification, we investigate two important components: feature extraction and classification layer. In the feature extraction, we utilize a hierarchical feature fusion block to extract the multi-scale features. Furthermore, we adopt an attention mechanism to obtain the local discriminative parts of feature maps. In the classification layer, we utilize a fully convolutional classifier to generate channel-level classifications which are then processed by a global pooling layer to improve classification efficiency. We evaluated the proposed approach on benchmark datasets of RRC-MLT2017, SIW-13, CVSI-2015 and MLe2e, and the experimental results show the effectiveness of each elaborate designed component. Finally, we achieve better performances than those competitive models, where the correct rates are 89.66%, 96.11%, 98.78% and 97.20% on PRC-MLT2017, SIW-13, CVSI-2015 and MLe2e, respectively.",

keywords = "Attention mechanism, Feature fusion, Global max pooling, Multi-scale features, Script identification",

author = "Mengkai Ma and Wang, {Qiu Feng} and Shan Huang and Shen Huang and Yannis Goulermas and Kaizhu Huang",

note = "Funding Information: This work is supported by National Natural Science Foundation of China (NSFC) under no. 61876154 and 61876155; Natural Science Foundation of Jiangsu Province under no. BK20181189, BK20181190, BE2020006-4; Key Program Special Fund in XJTLU under no. KSF-A-10, KSF-A-01, KSF-T-06, KSF-E-26, KSF-P-02; XJTLU Research Development Fund under no. RDF-16-02-49, and CCF-Tencent Open Research Fund under no. RAGR20180109. Publisher Copyright: {\textcopyright} 2020 Elsevier B.V.",

year = "2021",

month = jan,

day = "15",

doi = "10.1016/j.neucom.2020.09.015",

language = "English",

volume = "421",

pages = "222--233",

journal = "Neurocomputing",

issn = "0925-2312",

}

TY - JOUR

T1 - Residual attention-based multi-scale script identification in scene text images

AU - Ma, Mengkai

AU - Wang, Qiu Feng

AU - Huang, Shan

AU - Huang, Shen

AU - Goulermas, Yannis

AU - Huang, Kaizhu

N1 - Funding Information: This work is supported by National Natural Science Foundation of China (NSFC) under no. 61876154 and 61876155; Natural Science Foundation of Jiangsu Province under no. BK20181189, BK20181190, BE2020006-4; Key Program Special Fund in XJTLU under no. KSF-A-10, KSF-A-01, KSF-T-06, KSF-E-26, KSF-P-02; XJTLU Research Development Fund under no. RDF-16-02-49, and CCF-Tencent Open Research Fund under no. RAGR20180109. Publisher Copyright: © 2020 Elsevier B.V.

PY - 2021/1/15

Y1 - 2021/1/15

N2 - Script identification is an essential step in the text extraction pipeline for multi-lingual application. This paper presents an effective approach to identify scripts in scene text images. Due to the complicated background, various text styles, character similarity of different languages, script identification has not been solved yet. Under the general classification framework of script identification, we investigate two important components: feature extraction and classification layer. In the feature extraction, we utilize a hierarchical feature fusion block to extract the multi-scale features. Furthermore, we adopt an attention mechanism to obtain the local discriminative parts of feature maps. In the classification layer, we utilize a fully convolutional classifier to generate channel-level classifications which are then processed by a global pooling layer to improve classification efficiency. We evaluated the proposed approach on benchmark datasets of RRC-MLT2017, SIW-13, CVSI-2015 and MLe2e, and the experimental results show the effectiveness of each elaborate designed component. Finally, we achieve better performances than those competitive models, where the correct rates are 89.66%, 96.11%, 98.78% and 97.20% on PRC-MLT2017, SIW-13, CVSI-2015 and MLe2e, respectively.

AB - Script identification is an essential step in the text extraction pipeline for multi-lingual application. This paper presents an effective approach to identify scripts in scene text images. Due to the complicated background, various text styles, character similarity of different languages, script identification has not been solved yet. Under the general classification framework of script identification, we investigate two important components: feature extraction and classification layer. In the feature extraction, we utilize a hierarchical feature fusion block to extract the multi-scale features. Furthermore, we adopt an attention mechanism to obtain the local discriminative parts of feature maps. In the classification layer, we utilize a fully convolutional classifier to generate channel-level classifications which are then processed by a global pooling layer to improve classification efficiency. We evaluated the proposed approach on benchmark datasets of RRC-MLT2017, SIW-13, CVSI-2015 and MLe2e, and the experimental results show the effectiveness of each elaborate designed component. Finally, we achieve better performances than those competitive models, where the correct rates are 89.66%, 96.11%, 98.78% and 97.20% on PRC-MLT2017, SIW-13, CVSI-2015 and MLe2e, respectively.

KW - Attention mechanism

KW - Feature fusion

KW - Global max pooling

KW - Multi-scale features

KW - Script identification

UR - http://www.scopus.com/inward/record.url?scp=85092722048&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2020.09.015

DO - 10.1016/j.neucom.2020.09.015

M3 - Article

AN - SCOPUS:85092722048

SN - 0925-2312

VL - 421

SP - 222

EP - 233

JO - Neurocomputing

JF - Neurocomputing

ER -

Residual attention-based multi-scale script identification in scene text images

Abstract

Keywords

Access to Document

Other files and links

Cite this