EID-GAN: Generative Adversarial Nets for Extremely Imbalanced Data Augmentation

Wei Li; Jinlin Chen; Jiannong Cao; Chao Ma; Jia Wang; Xiaohui Cui; Ping Chen

doi:10.1109/TII.2022.3182781

EID-GAN: Generative Adversarial Nets for Extremely Imbalanced Data Augmentation

Wei Li, Jinlin Chen, Jiannong Cao, Chao Ma, Jia Wang, Xiaohui Cui, Ping Chen

Xi'an Jiaotong-Liverpool University

Research output: Contribution to journal › Article › peer-review

39 Citations (Scopus)

Abstract

Imbalanced data causes deep neural networks to output biased results, and it becomes more serious when facing extremely imbalanced data regarding the outliers with tiny size (the ratio of the outlier size to the image size is around 0.05%). Many data argumentation models are proposed to supplement imbalanced data to alleviate biased results. However, the existing augmentation models cannot synthesize tiny outliers, which makes the generated data unavailable. In this paper, we propose a new augmentation model named <bold>Extremely Imbalanced Data Augmentation Generative Adversarial Nets</bold> (<inline-formula><tex-math notation="LaTeX">$EID$</tex-math></inline-formula>-GAN) to address the extremely imbalanced data augmentation problem. First, we design a new penalty function by subtracting the outliers from the cropped region of generated instance to guide the generator to learn the features of outliers. After that, we combine the output value of the penalty function with the generator loss to jointly update the generator’s parameters with back-propagation. Second, we propose a new evaluation approach that adopts two outlier detectors with k-fold cross-validation to assess the availability of generated instances. We conduct extensive experiments to demonstrate the significant performance improvement of <inline-formula><tex-math notation="LaTeX">$EID$</tex-math></inline-formula>-GAN on two extremely imbalanced datasets: industrial Piston and Fabric datasets; and one general imbalanced dataset: the public DAGM dataset. The experimental results show that our <inline-formula><tex-math notation="LaTeX">$EID$</tex-math></inline-formula>-GAN outperforms the SOTA augmentation models on different imbalanced datasets.

Original language	English
Pages (from-to)	1-10
Number of pages	10
Journal	IEEE Transactions on Industrial Informatics
Volume	19
Issue number	3
DOIs	https://doi.org/10.1109/TII.2022.3182781
Publication status	Accepted/In press - 2022

Keywords

Data models
Detectors
Extremely imbalanced data augmentation
Fabrics
GAN
generated data evaluation
Generators
norm penalty function
Pistons
Prototypes
Training

Access to Document

10.1109/TII.2022.3182781

Cite this

Li, W., Chen, J., Cao, J., Ma, C., Wang, J., Cui, X., & Chen, P. (Accepted/In press). EID-GAN: Generative Adversarial Nets for Extremely Imbalanced Data Augmentation. IEEE Transactions on Industrial Informatics, 19(3), 1-10. https://doi.org/10.1109/TII.2022.3182781

@article{261bdd6c7ffb488a99311cde75c6941b,

title = "EID-GAN: Generative Adversarial Nets for Extremely Imbalanced Data Augmentation",

abstract = "Imbalanced data causes deep neural networks to output biased results, and it becomes more serious when facing extremely imbalanced data regarding the outliers with tiny size (the ratio of the outlier size to the image size is around 0.05%). Many data argumentation models are proposed to supplement imbalanced data to alleviate biased results. However, the existing augmentation models cannot synthesize tiny outliers, which makes the generated data unavailable. In this paper, we propose a new augmentation model named Extremely Imbalanced Data Augmentation Generative Adversarial Nets ($EID$-GAN) to address the extremely imbalanced data augmentation problem. First, we design a new penalty function by subtracting the outliers from the cropped region of generated instance to guide the generator to learn the features of outliers. After that, we combine the output value of the penalty function with the generator loss to jointly update the generator{\textquoteright}s parameters with back-propagation. Second, we propose a new evaluation approach that adopts two outlier detectors with k-fold cross-validation to assess the availability of generated instances. We conduct extensive experiments to demonstrate the significant performance improvement of $EID$-GAN on two extremely imbalanced datasets: industrial Piston and Fabric datasets; and one general imbalanced dataset: the public DAGM dataset. The experimental results show that our $EID$-GAN outperforms the SOTA augmentation models on different imbalanced datasets.",

keywords = "Data models, Detectors, Extremely imbalanced data augmentation, Fabrics, GAN, generated data evaluation, Generators, norm penalty function, Pistons, Prototypes, Training",

author = "Wei Li and Jinlin Chen and Jiannong Cao and Chao Ma and Jia Wang and Xiaohui Cui and Ping Chen",

note = "Publisher Copyright: IEEE",

year = "2022",

doi = "10.1109/TII.2022.3182781",

language = "English",

volume = "19",

pages = "1--10",

journal = "IEEE Transactions on Industrial Informatics",

issn = "1551-3203",

publisher = "IEEE",

number = "3",

}

TY - JOUR

T1 - EID-GAN

T2 - Generative Adversarial Nets for Extremely Imbalanced Data Augmentation

AU - Li, Wei

AU - Chen, Jinlin

AU - Cao, Jiannong

AU - Ma, Chao

AU - Wang, Jia

AU - Cui, Xiaohui

AU - Chen, Ping

N1 - Publisher Copyright: IEEE

PY - 2022

Y1 - 2022

N2 - Imbalanced data causes deep neural networks to output biased results, and it becomes more serious when facing extremely imbalanced data regarding the outliers with tiny size (the ratio of the outlier size to the image size is around 0.05%). Many data argumentation models are proposed to supplement imbalanced data to alleviate biased results. However, the existing augmentation models cannot synthesize tiny outliers, which makes the generated data unavailable. In this paper, we propose a new augmentation model named Extremely Imbalanced Data Augmentation Generative Adversarial Nets ($EID$-GAN) to address the extremely imbalanced data augmentation problem. First, we design a new penalty function by subtracting the outliers from the cropped region of generated instance to guide the generator to learn the features of outliers. After that, we combine the output value of the penalty function with the generator loss to jointly update the generator’s parameters with back-propagation. Second, we propose a new evaluation approach that adopts two outlier detectors with k-fold cross-validation to assess the availability of generated instances. We conduct extensive experiments to demonstrate the significant performance improvement of $EID$-GAN on two extremely imbalanced datasets: industrial Piston and Fabric datasets; and one general imbalanced dataset: the public DAGM dataset. The experimental results show that our $EID$-GAN outperforms the SOTA augmentation models on different imbalanced datasets.

AB - Imbalanced data causes deep neural networks to output biased results, and it becomes more serious when facing extremely imbalanced data regarding the outliers with tiny size (the ratio of the outlier size to the image size is around 0.05%). Many data argumentation models are proposed to supplement imbalanced data to alleviate biased results. However, the existing augmentation models cannot synthesize tiny outliers, which makes the generated data unavailable. In this paper, we propose a new augmentation model named Extremely Imbalanced Data Augmentation Generative Adversarial Nets ($EID$-GAN) to address the extremely imbalanced data augmentation problem. First, we design a new penalty function by subtracting the outliers from the cropped region of generated instance to guide the generator to learn the features of outliers. After that, we combine the output value of the penalty function with the generator loss to jointly update the generator’s parameters with back-propagation. Second, we propose a new evaluation approach that adopts two outlier detectors with k-fold cross-validation to assess the availability of generated instances. We conduct extensive experiments to demonstrate the significant performance improvement of $EID$-GAN on two extremely imbalanced datasets: industrial Piston and Fabric datasets; and one general imbalanced dataset: the public DAGM dataset. The experimental results show that our $EID$-GAN outperforms the SOTA augmentation models on different imbalanced datasets.

KW - Data models

KW - Detectors

KW - Extremely imbalanced data augmentation

KW - Fabrics

KW - GAN

KW - generated data evaluation

KW - Generators

KW - norm penalty function

KW - Pistons

KW - Prototypes

KW - Training

UR - http://www.scopus.com/inward/record.url?scp=85132699385&partnerID=8YFLogxK

U2 - 10.1109/TII.2022.3182781

DO - 10.1109/TII.2022.3182781

M3 - Article

AN - SCOPUS:85132699385

SN - 1551-3203

VL - 19

SP - 1

EP - 10

JO - IEEE Transactions on Industrial Informatics

JF - IEEE Transactions on Industrial Informatics

IS - 3

ER -

EID-GAN: Generative Adversarial Nets for Extremely Imbalanced Data Augmentation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this