Abstract
Deciphering oracle bone script is of great significance to the study of ancient Chinese culture as well as archaeology. Although recent studies on oracle character recognition have made substantial progress, they still suffer from the long-tailed data situation that results in a noticeable performance drop on the tail classes. To mitigate this issue, we propose a generative adversarial framework to augment oracle characters in the problematic classes. In this framework, the generator produces synthetic data through convex combinations of all the available samples in the corresponding classes, and is further optimized through adversarial learning with the classifier and simultaneously the discriminator. Meanwhile, we introduce Repatch to generalize samples in the generator. Since tail classes do not have sufficient data for convex combinations, we propose the TailMix mechanism to generate suitable tail class samples from other classes. Experimental results show that our proposed algorithm obtains remarkable performance in oracle character recognition and achieves new state-of-the-art average (total) accuracy with 86.03% (89.46%), 86.54% (93.86%), 95.22% (96.17%) on the three datasets Oracle-AYNU, OBC306 and Oracle-20K, respectively.
Original language | English |
---|---|
Article number | 109534 |
Journal | Pattern Recognition |
Volume | 140 |
DOIs | |
Publication status | Published - Aug 2023 |
Keywords
- Data augmentation
- Data imbalance
- Generative adversarial networks
- Long tail
- Mixup strategy
- Oracle character recognition