TY - GEN
T1 - Mix-Up Augmentation for Oracle Character Recognition with Imbalanced Data Distribution
AU - Li, Jing
AU - Wang, Qiu Feng
AU - Zhang, Rui
AU - Huang, Kaizhu
N1 - Funding Information:
Acknowledgements. The work was partially supported by the following: National Natural Science Foundation of China under no.61876155 and no.61876154; Jiangsu Science and Technology Programme (Natural Science Foundation of Jiangsu Province) under no. BE2020006-4B, BK20181189, BK20181190; Key Program Special Fund in XJTLU under no. KSF-T-06, KSF-E-26, and KSF-A-10, and the open program of Henan Key Laboratory of Oracle Bone Inscription Information Processing (AnYang Normal University) under no. OIP2019H001.
Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - Oracle bone characters are probably the oldest hieroglyphs in China. It is of significant impact to recognize such characters since they can provide important clues for Chinese archaeology and philology. Automatic oracle bone character recognition however remains to be a challenging problem. In particular, due to the inherited nature, oracle characters are typically very limited and also seriously imbalanced in most available oracle datasets, which greatly hinders the research in automatic oracle bone character recognition. To alleviate this problem, we propose to design the mix-up strategy that leverages information from both majority and minority classes to augment samples of minority classes such that their boundaries can be pushed away towards majority classes. As a result, the training bias resulted from majority classes can be largely reduced. In addition, we consolidate our new framework with both the softmax loss and triplet loss on the augmented samples which proves able to improve the classification accuracy further. We conduct extensive evaluations w.r.t. both total class accuracy and average class accuracy on three benchmark datasets (i.e., Oracle-20K, Oracle-AYNU and OBC306). Experimental results show that the proposed method can result in superior performance to the comparison approaches, attaining a new state of the art in oracle bone character recognition.
AB - Oracle bone characters are probably the oldest hieroglyphs in China. It is of significant impact to recognize such characters since they can provide important clues for Chinese archaeology and philology. Automatic oracle bone character recognition however remains to be a challenging problem. In particular, due to the inherited nature, oracle characters are typically very limited and also seriously imbalanced in most available oracle datasets, which greatly hinders the research in automatic oracle bone character recognition. To alleviate this problem, we propose to design the mix-up strategy that leverages information from both majority and minority classes to augment samples of minority classes such that their boundaries can be pushed away towards majority classes. As a result, the training bias resulted from majority classes can be largely reduced. In addition, we consolidate our new framework with both the softmax loss and triplet loss on the augmented samples which proves able to improve the classification accuracy further. We conduct extensive evaluations w.r.t. both total class accuracy and average class accuracy on three benchmark datasets (i.e., Oracle-20K, Oracle-AYNU and OBC306). Experimental results show that the proposed method can result in superior performance to the comparison approaches, attaining a new state of the art in oracle bone character recognition.
KW - Imbalanced data
KW - Mix-up augmentation
KW - Oracle character recognition
KW - Triplet loss
UR - http://www.scopus.com/inward/record.url?scp=85115255753&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-86549-8_16
DO - 10.1007/978-3-030-86549-8_16
M3 - Conference Proceeding
AN - SCOPUS:85115255753
SN - 9783030865481
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 237
EP - 251
BT - Document Analysis and Recognition - ICDAR 2021 - 16th International Conference, Proceedings
A2 - Lladós, Josep
A2 - Lopresti, Daniel
A2 - Uchida, Seiichi
PB - Springer Science and Business Media Deutschland GmbH
T2 - 16th International Conference on Document Analysis and Recognition, ICDAR 2021
Y2 - 5 September 2021 through 10 September 2021
ER -