Imbalanced data classification based on DB-SLSMOTE and random forest

Qi Han; Rui Yang; Zitong Wan; Shaozhi Chen; Mengjie Huang; Huiqing Wen

doi:10.1109/CAC51589.2020.9326743

Imbalanced data classification based on DB-SLSMOTE and random forest

Qi Han, Rui Yang^*, Zitong Wan, Shaozhi Chen, Mengjie Huang, Huiqing Wen

^*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

11 Citations (Scopus)

Abstract

The classification problem of imbalanced data is a popular issue in the field of machine learning in recent years. For imbalanced data, traditional classification algorithms tend to classify minority class samples into majority class, which result in the misclassification of many minority samples by the classifier. For imbalanced data classification problems, this paper proposes a Density Based Safe Level Synthetic Minority Oversampling TEchnique (DB-SLSMOTE). First, the algorithm clusters minority samples through Density-Based Spatial Clustering of Applications with Noise (DBSCAN). Then, the Safe Level Synthetic Minority Oversampling TEchnique (Safe-Level- SMOTE) is utilized for clusters of any shape discovered by DBSCAN. It is followed that the processed data is classified by Random Forest (RF). The experimental results show that the DB- SLSMOTE algorithm can effectively improve the classification performance of RF for minority samples in imbalanced data.

Original language	English
Title of host publication	Proceedings - 2020 Chinese Automation Congress, CAC 2020
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	6271-6276
Number of pages	6
ISBN (Electronic)	9781728176871
DOIs	https://doi.org/10.1109/CAC51589.2020.9326743
Publication status	Published - 6 Nov 2020
Event	2020 Chinese Automation Congress, CAC 2020 - Shanghai, China Duration: 6 Nov 2020 → 8 Nov 2020

Publication series

Name	Proceedings - 2020 Chinese Automation Congress, CAC 2020

Conference

Conference	2020 Chinese Automation Congress, CAC 2020
Country/Territory	China
City	Shanghai
Period	6/11/20 → 8/11/20

Keywords

DB-SLSMOTE
DBSCAN
Imbalanced data classification
Random Forest
Safe- Level-SMOTE

Access to Document

10.1109/CAC51589.2020.9326743

Cite this

Han, Q., Yang, R., Wan, Z., Chen, S., Huang, M., & Wen, H. (2020). Imbalanced data classification based on DB-SLSMOTE and random forest. In Proceedings - 2020 Chinese Automation Congress, CAC 2020 (pp. 6271-6276). Article 9326743 (Proceedings - 2020 Chinese Automation Congress, CAC 2020). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CAC51589.2020.9326743

@inproceedings{8303e7f92e334cc086675012776dac86,

title = "Imbalanced data classification based on DB-SLSMOTE and random forest",

abstract = "The classification problem of imbalanced data is a popular issue in the field of machine learning in recent years. For imbalanced data, traditional classification algorithms tend to classify minority class samples into majority class, which result in the misclassification of many minority samples by the classifier. For imbalanced data classification problems, this paper proposes a Density Based Safe Level Synthetic Minority Oversampling TEchnique (DB-SLSMOTE). First, the algorithm clusters minority samples through Density-Based Spatial Clustering of Applications with Noise (DBSCAN). Then, the Safe Level Synthetic Minority Oversampling TEchnique (Safe-Level- SMOTE) is utilized for clusters of any shape discovered by DBSCAN. It is followed that the processed data is classified by Random Forest (RF). The experimental results show that the DB- SLSMOTE algorithm can effectively improve the classification performance of RF for minority samples in imbalanced data.",

keywords = "DB-SLSMOTE, DBSCAN, Imbalanced data classification, Random Forest, Safe- Level-SMOTE",

author = "Qi Han and Rui Yang and Zitong Wan and Shaozhi Chen and Mengjie Huang and Huiqing Wen",

note = "Publisher Copyright: {\textcopyright} 2020 IEEE.; 2020 Chinese Automation Congress, CAC 2020 ; Conference date: 06-11-2020 Through 08-11-2020",

year = "2020",

month = nov,

day = "6",

doi = "10.1109/CAC51589.2020.9326743",

language = "English",

series = "Proceedings - 2020 Chinese Automation Congress, CAC 2020",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "6271--6276",

booktitle = "Proceedings - 2020 Chinese Automation Congress, CAC 2020",

}

Han, Q, Yang, R, Wan, Z, Chen, S, Huang, M & Wen, H 2020, Imbalanced data classification based on DB-SLSMOTE and random forest. in Proceedings - 2020 Chinese Automation Congress, CAC 2020., 9326743, Proceedings - 2020 Chinese Automation Congress, CAC 2020, Institute of Electrical and Electronics Engineers Inc., pp. 6271-6276, 2020 Chinese Automation Congress, CAC 2020, Shanghai, China, 6/11/20. https://doi.org/10.1109/CAC51589.2020.9326743

Imbalanced data classification based on DB-SLSMOTE and random forest. / Han, Qi; Yang, Rui; Wan, Zitong et al.
Proceedings - 2020 Chinese Automation Congress, CAC 2020. Institute of Electrical and Electronics Engineers Inc., 2020. p. 6271-6276 9326743 (Proceedings - 2020 Chinese Automation Congress, CAC 2020).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Imbalanced data classification based on DB-SLSMOTE and random forest

AU - Han, Qi

AU - Yang, Rui

AU - Wan, Zitong

AU - Chen, Shaozhi

AU - Huang, Mengjie

AU - Wen, Huiqing

PY - 2020/11/6

Y1 - 2020/11/6

N2 - The classification problem of imbalanced data is a popular issue in the field of machine learning in recent years. For imbalanced data, traditional classification algorithms tend to classify minority class samples into majority class, which result in the misclassification of many minority samples by the classifier. For imbalanced data classification problems, this paper proposes a Density Based Safe Level Synthetic Minority Oversampling TEchnique (DB-SLSMOTE). First, the algorithm clusters minority samples through Density-Based Spatial Clustering of Applications with Noise (DBSCAN). Then, the Safe Level Synthetic Minority Oversampling TEchnique (Safe-Level- SMOTE) is utilized for clusters of any shape discovered by DBSCAN. It is followed that the processed data is classified by Random Forest (RF). The experimental results show that the DB- SLSMOTE algorithm can effectively improve the classification performance of RF for minority samples in imbalanced data.

AB - The classification problem of imbalanced data is a popular issue in the field of machine learning in recent years. For imbalanced data, traditional classification algorithms tend to classify minority class samples into majority class, which result in the misclassification of many minority samples by the classifier. For imbalanced data classification problems, this paper proposes a Density Based Safe Level Synthetic Minority Oversampling TEchnique (DB-SLSMOTE). First, the algorithm clusters minority samples through Density-Based Spatial Clustering of Applications with Noise (DBSCAN). Then, the Safe Level Synthetic Minority Oversampling TEchnique (Safe-Level- SMOTE) is utilized for clusters of any shape discovered by DBSCAN. It is followed that the processed data is classified by Random Forest (RF). The experimental results show that the DB- SLSMOTE algorithm can effectively improve the classification performance of RF for minority samples in imbalanced data.

KW - DB-SLSMOTE

KW - DBSCAN

KW - Imbalanced data classification

KW - Random Forest

KW - Safe- Level-SMOTE

UR - http://www.scopus.com/inward/record.url?scp=85100915792&partnerID=8YFLogxK

U2 - 10.1109/CAC51589.2020.9326743

DO - 10.1109/CAC51589.2020.9326743

M3 - Conference Proceeding

AN - SCOPUS:85100915792

T3 - Proceedings - 2020 Chinese Automation Congress, CAC 2020

SP - 6271

EP - 6276

BT - Proceedings - 2020 Chinese Automation Congress, CAC 2020

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2020 Chinese Automation Congress, CAC 2020

Y2 - 6 November 2020 through 8 November 2020

ER -

Imbalanced data classification based on DB-SLSMOTE and random forest

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Cite this