TY - JOUR
T1 - Clustering-based incremental learning for imbalanced data classification
AU - Liu, Yuxin
AU - Du, Guangyu
AU - Yin, Chenke
AU - Zhang, Hachao
AU - Wang, Jia
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2024/5/23
Y1 - 2024/5/23
N2 - Imbalanced data classification presents a significant challenge when there is a substantial disparity in sample sizes across different classes. This issue severely affects classifier accuracy in predicting minority classes, hampering numerous real-world applications. Traditional methods address data imbalance by using undersampling or oversampling techniques. However, these methods may lead to information loss during sample reduction or introduce noise and model bias through synthetic sample generation. In this paper, we introduce DRIL, an innovative clustering-based incremental learning approach designed to overcome these limitations and improve the classification of minority class samples. Specifically, we employ a “two-step clustering” method to rebalance the dataset, partitioning it into similar and representative sub-dataset. Subsequently, incremental learning is applied to enable the classifier to gradually acquire knowledge about these sub-data, establishing a comprehensive understanding of all features present in the imbalanced dataset. Experimental results on twenty datasets demonstrate that our incremental learning-based algorithm outperforms baseline methods in correctly classifying minority classes while exhibiting improved precision and F1 score performance.
AB - Imbalanced data classification presents a significant challenge when there is a substantial disparity in sample sizes across different classes. This issue severely affects classifier accuracy in predicting minority classes, hampering numerous real-world applications. Traditional methods address data imbalance by using undersampling or oversampling techniques. However, these methods may lead to information loss during sample reduction or introduce noise and model bias through synthetic sample generation. In this paper, we introduce DRIL, an innovative clustering-based incremental learning approach designed to overcome these limitations and improve the classification of minority class samples. Specifically, we employ a “two-step clustering” method to rebalance the dataset, partitioning it into similar and representative sub-dataset. Subsequently, incremental learning is applied to enable the classifier to gradually acquire knowledge about these sub-data, establishing a comprehensive understanding of all features present in the imbalanced dataset. Experimental results on twenty datasets demonstrate that our incremental learning-based algorithm outperforms baseline methods in correctly classifying minority classes while exhibiting improved precision and F1 score performance.
KW - Classification
KW - Clustering
KW - DIRL
KW - Imbalance data
KW - Incremental learning
UR - http://www.scopus.com/inward/record.url?scp=85187807040&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2024.111612
DO - 10.1016/j.knosys.2024.111612
M3 - Article
AN - SCOPUS:85187807040
SN - 0950-7051
VL - 292
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 111612
ER -