Mix-Up Augmentation for Oracle Character Recognition with Imbalanced Data Distribution

Jing Li, Qiu Feng Wang*, Rui Zhang, Kaizhu Huang

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

7 Citations (Scopus)

Abstract

Oracle bone characters are probably the oldest hieroglyphs in China. It is of significant impact to recognize such characters since they can provide important clues for Chinese archaeology and philology. Automatic oracle bone character recognition however remains to be a challenging problem. In particular, due to the inherited nature, oracle characters are typically very limited and also seriously imbalanced in most available oracle datasets, which greatly hinders the research in automatic oracle bone character recognition. To alleviate this problem, we propose to design the mix-up strategy that leverages information from both majority and minority classes to augment samples of minority classes such that their boundaries can be pushed away towards majority classes. As a result, the training bias resulted from majority classes can be largely reduced. In addition, we consolidate our new framework with both the softmax loss and triplet loss on the augmented samples which proves able to improve the classification accuracy further. We conduct extensive evaluations w.r.t. both total class accuracy and average class accuracy on three benchmark datasets (i.e., Oracle-20K, Oracle-AYNU and OBC306). Experimental results show that the proposed method can result in superior performance to the comparison approaches, attaining a new state of the art in oracle bone character recognition.

Original languageEnglish
Title of host publicationDocument Analysis and Recognition - ICDAR 2021 - 16th International Conference, Proceedings
EditorsJosep Lladós, Daniel Lopresti, Seiichi Uchida
PublisherSpringer Science and Business Media Deutschland GmbH
Pages237-251
Number of pages15
ISBN (Print)9783030865481
DOIs
Publication statusPublished - 2021
Event16th International Conference on Document Analysis and Recognition, ICDAR 2021 - Lausanne, Switzerland
Duration: 5 Sept 202110 Sept 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12821 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference16th International Conference on Document Analysis and Recognition, ICDAR 2021
Country/TerritorySwitzerland
CityLausanne
Period5/09/2110/09/21

Keywords

  • Imbalanced data
  • Mix-up augmentation
  • Oracle character recognition
  • Triplet loss

Cite this