Towards better long-tailed oracle character recognition with adversarial data augmentation

Jing Li, Qiu Feng Wang*, Kaizhu Huang, Xi Yang, Rui Zhang, John Y. Goulermas

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

15 Citations (Scopus)

Abstract

Deciphering oracle bone script is of great significance to the study of ancient Chinese culture as well as archaeology. Although recent studies on oracle character recognition have made substantial progress, they still suffer from the long-tailed data situation that results in a noticeable performance drop on the tail classes. To mitigate this issue, we propose a generative adversarial framework to augment oracle characters in the problematic classes. In this framework, the generator produces synthetic data through convex combinations of all the available samples in the corresponding classes, and is further optimized through adversarial learning with the classifier and simultaneously the discriminator. Meanwhile, we introduce Repatch to generalize samples in the generator. Since tail classes do not have sufficient data for convex combinations, we propose the TailMix mechanism to generate suitable tail class samples from other classes. Experimental results show that our proposed algorithm obtains remarkable performance in oracle character recognition and achieves new state-of-the-art average (total) accuracy with 86.03% (89.46%), 86.54% (93.86%), 95.22% (96.17%) on the three datasets Oracle-AYNU, OBC306 and Oracle-20K, respectively.

Original languageEnglish
Article number109534
JournalPattern Recognition
Volume140
DOIs
Publication statusPublished - Aug 2023

Keywords

  • Data augmentation
  • Data imbalance
  • Generative adversarial networks
  • Long tail
  • Mixup strategy
  • Oracle character recognition

Fingerprint

Dive into the research topics of 'Towards better long-tailed oracle character recognition with adversarial data augmentation'. Together they form a unique fingerprint.

Cite this