Clustering-based incremental learning for imbalanced data classification

Yuxin Liu; Guangyu Du; Chenke Yin; Hachao Zhang; Jia Wang

doi:10.1016/j.knosys.2024.111612

Clustering-based incremental learning for imbalanced data classification

Yuxin Liu, Guangyu Du, Chenke Yin, Hachao Zhang, Jia Wang^*

^*Corresponding author for this work

School of Advanced Technology

Research output: Contribution to journal › Article › peer-review

6 Citations (Scopus)

Abstract

Imbalanced data classification presents a significant challenge when there is a substantial disparity in sample sizes across different classes. This issue severely affects classifier accuracy in predicting minority classes, hampering numerous real-world applications. Traditional methods address data imbalance by using undersampling or oversampling techniques. However, these methods may lead to information loss during sample reduction or introduce noise and model bias through synthetic sample generation. In this paper, we introduce DRIL, an innovative clustering-based incremental learning approach designed to overcome these limitations and improve the classification of minority class samples. Specifically, we employ a “two-step clustering” method to rebalance the dataset, partitioning it into similar and representative sub-dataset. Subsequently, incremental learning is applied to enable the classifier to gradually acquire knowledge about these sub-data, establishing a comprehensive understanding of all features present in the imbalanced dataset. Experimental results on twenty datasets demonstrate that our incremental learning-based algorithm outperforms baseline methods in correctly classifying minority classes while exhibiting improved precision and F1 score performance.

Original language	English
Article number	111612
Journal	Knowledge-Based Systems
Volume	292
DOIs	https://doi.org/10.1016/j.knosys.2024.111612
Publication status	Published - 23 May 2024

Keywords

Classification
Clustering
DIRL
Imbalance data
Incremental learning

Access to Document

10.1016/j.knosys.2024.111612

Cite this

@article{02a64f551cb2457c9753eb9b14fd7bde,

title = "Clustering-based incremental learning for imbalanced data classification",

abstract = "Imbalanced data classification presents a significant challenge when there is a substantial disparity in sample sizes across different classes. This issue severely affects classifier accuracy in predicting minority classes, hampering numerous real-world applications. Traditional methods address data imbalance by using undersampling or oversampling techniques. However, these methods may lead to information loss during sample reduction or introduce noise and model bias through synthetic sample generation. In this paper, we introduce DRIL, an innovative clustering-based incremental learning approach designed to overcome these limitations and improve the classification of minority class samples. Specifically, we employ a “two-step clustering” method to rebalance the dataset, partitioning it into similar and representative sub-dataset. Subsequently, incremental learning is applied to enable the classifier to gradually acquire knowledge about these sub-data, establishing a comprehensive understanding of all features present in the imbalanced dataset. Experimental results on twenty datasets demonstrate that our incremental learning-based algorithm outperforms baseline methods in correctly classifying minority classes while exhibiting improved precision and F1 score performance.",

keywords = "Classification, Clustering, DIRL, Imbalance data, Incremental learning",

author = "Yuxin Liu and Guangyu Du and Chenke Yin and Hachao Zhang and Jia Wang",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier B.V.",

year = "2024",

month = may,

day = "23",

doi = "10.1016/j.knosys.2024.111612",

language = "English",

volume = "292",

journal = "Knowledge-Based Systems",

issn = "0950-7051",

publisher = "Elsevier",

}

TY - JOUR

T1 - Clustering-based incremental learning for imbalanced data classification

AU - Liu, Yuxin

AU - Du, Guangyu

AU - Yin, Chenke

AU - Zhang, Hachao

AU - Wang, Jia

PY - 2024/5/23

Y1 - 2024/5/23

N2 - Imbalanced data classification presents a significant challenge when there is a substantial disparity in sample sizes across different classes. This issue severely affects classifier accuracy in predicting minority classes, hampering numerous real-world applications. Traditional methods address data imbalance by using undersampling or oversampling techniques. However, these methods may lead to information loss during sample reduction or introduce noise and model bias through synthetic sample generation. In this paper, we introduce DRIL, an innovative clustering-based incremental learning approach designed to overcome these limitations and improve the classification of minority class samples. Specifically, we employ a “two-step clustering” method to rebalance the dataset, partitioning it into similar and representative sub-dataset. Subsequently, incremental learning is applied to enable the classifier to gradually acquire knowledge about these sub-data, establishing a comprehensive understanding of all features present in the imbalanced dataset. Experimental results on twenty datasets demonstrate that our incremental learning-based algorithm outperforms baseline methods in correctly classifying minority classes while exhibiting improved precision and F1 score performance.

AB - Imbalanced data classification presents a significant challenge when there is a substantial disparity in sample sizes across different classes. This issue severely affects classifier accuracy in predicting minority classes, hampering numerous real-world applications. Traditional methods address data imbalance by using undersampling or oversampling techniques. However, these methods may lead to information loss during sample reduction or introduce noise and model bias through synthetic sample generation. In this paper, we introduce DRIL, an innovative clustering-based incremental learning approach designed to overcome these limitations and improve the classification of minority class samples. Specifically, we employ a “two-step clustering” method to rebalance the dataset, partitioning it into similar and representative sub-dataset. Subsequently, incremental learning is applied to enable the classifier to gradually acquire knowledge about these sub-data, establishing a comprehensive understanding of all features present in the imbalanced dataset. Experimental results on twenty datasets demonstrate that our incremental learning-based algorithm outperforms baseline methods in correctly classifying minority classes while exhibiting improved precision and F1 score performance.

KW - Classification

KW - Clustering

KW - DIRL

KW - Imbalance data

KW - Incremental learning

UR - http://www.scopus.com/inward/record.url?scp=85187807040&partnerID=8YFLogxK

U2 - 10.1016/j.knosys.2024.111612

DO - 10.1016/j.knosys.2024.111612

M3 - Article

AN - SCOPUS:85187807040

SN - 0950-7051

VL - 292

JO - Knowledge-Based Systems

JF - Knowledge-Based Systems

M1 - 111612

ER -

Clustering-based incremental learning for imbalanced data classification

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this