Imbalanced data classification based on DB-SLSMOTE and random forest

Qi Han, Rui Yang*, Zitong Wan, Shaozhi Chen, Mengjie Huang, Huiqing Wen

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

9 Citations (Scopus)

Abstract

The classification problem of imbalanced data is a popular issue in the field of machine learning in recent years. For imbalanced data, traditional classification algorithms tend to classify minority class samples into majority class, which result in the misclassification of many minority samples by the classifier. For imbalanced data classification problems, this paper proposes a Density Based Safe Level Synthetic Minority Oversampling TEchnique (DB-SLSMOTE). First, the algorithm clusters minority samples through Density-Based Spatial Clustering of Applications with Noise (DBSCAN). Then, the Safe Level Synthetic Minority Oversampling TEchnique (Safe-Level- SMOTE) is utilized for clusters of any shape discovered by DBSCAN. It is followed that the processed data is classified by Random Forest (RF). The experimental results show that the DB- SLSMOTE algorithm can effectively improve the classification performance of RF for minority samples in imbalanced data.

Original languageEnglish
Title of host publicationProceedings - 2020 Chinese Automation Congress, CAC 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6271-6276
Number of pages6
ISBN (Electronic)9781728176871
DOIs
Publication statusPublished - 6 Nov 2020
Event2020 Chinese Automation Congress, CAC 2020 - Shanghai, China
Duration: 6 Nov 20208 Nov 2020

Publication series

NameProceedings - 2020 Chinese Automation Congress, CAC 2020

Conference

Conference2020 Chinese Automation Congress, CAC 2020
Country/TerritoryChina
CityShanghai
Period6/11/208/11/20

Keywords

  • DB-SLSMOTE
  • DBSCAN
  • Imbalanced data classification
  • Random Forest
  • Safe- Level-SMOTE

Cite this