TY - GEN
T1 - Imbalanced data classification based on DB-SLSMOTE and random forest
AU - Han, Qi
AU - Yang, Rui
AU - Wan, Zitong
AU - Chen, Shaozhi
AU - Huang, Mengjie
AU - Wen, Huiqing
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/11/6
Y1 - 2020/11/6
N2 - The classification problem of imbalanced data is a popular issue in the field of machine learning in recent years. For imbalanced data, traditional classification algorithms tend to classify minority class samples into majority class, which result in the misclassification of many minority samples by the classifier. For imbalanced data classification problems, this paper proposes a Density Based Safe Level Synthetic Minority Oversampling TEchnique (DB-SLSMOTE). First, the algorithm clusters minority samples through Density-Based Spatial Clustering of Applications with Noise (DBSCAN). Then, the Safe Level Synthetic Minority Oversampling TEchnique (Safe-Level- SMOTE) is utilized for clusters of any shape discovered by DBSCAN. It is followed that the processed data is classified by Random Forest (RF). The experimental results show that the DB- SLSMOTE algorithm can effectively improve the classification performance of RF for minority samples in imbalanced data.
AB - The classification problem of imbalanced data is a popular issue in the field of machine learning in recent years. For imbalanced data, traditional classification algorithms tend to classify minority class samples into majority class, which result in the misclassification of many minority samples by the classifier. For imbalanced data classification problems, this paper proposes a Density Based Safe Level Synthetic Minority Oversampling TEchnique (DB-SLSMOTE). First, the algorithm clusters minority samples through Density-Based Spatial Clustering of Applications with Noise (DBSCAN). Then, the Safe Level Synthetic Minority Oversampling TEchnique (Safe-Level- SMOTE) is utilized for clusters of any shape discovered by DBSCAN. It is followed that the processed data is classified by Random Forest (RF). The experimental results show that the DB- SLSMOTE algorithm can effectively improve the classification performance of RF for minority samples in imbalanced data.
KW - DB-SLSMOTE
KW - DBSCAN
KW - Imbalanced data classification
KW - Random Forest
KW - Safe- Level-SMOTE
UR - http://www.scopus.com/inward/record.url?scp=85100915792&partnerID=8YFLogxK
U2 - 10.1109/CAC51589.2020.9326743
DO - 10.1109/CAC51589.2020.9326743
M3 - Conference Proceeding
AN - SCOPUS:85100915792
T3 - Proceedings - 2020 Chinese Automation Congress, CAC 2020
SP - 6271
EP - 6276
BT - Proceedings - 2020 Chinese Automation Congress, CAC 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 Chinese Automation Congress, CAC 2020
Y2 - 6 November 2020 through 8 November 2020
ER -