An Improved Unbalanced Data Classification Method Based on Hybrid Sampling Approach

Biru Xu, Wenjia Wang, Rui Yang*, Qi Han

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

5 Citations (Scopus)

Abstract

The problem of data imbalance has received far- reaching concerns since they could affect the accuracy of classification problem in the area of machine learning. As the minority class instances can be ignored by traditional classifiers, it is necessary to improve the recognition rate of minority instances. Therefore, the paper proposes a new hybrid sampling method to solve the data imbalance problem by enlarging the proportion of minority instances. For the oversampling part, a variant of SMOTE is provided combining methods of LR-SMOTE and CCR (Combined Cleaning and Resampling Algorithm); for the under-sampling part, the Tomek-link method is utilized to complete the task. After the pre-processing stage, the data set is classified by Random Forest (RF). Experimental results show that the novel algorithm effectively enhances the performance of RF on the data set with a higher accuracy.

Original languageEnglish
Title of host publication2021 IEEE 4th International Conference on Big Data and Artificial Intelligence, BDAI 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages125-129
Number of pages5
ISBN (Electronic)9781665412704
DOIs
Publication statusPublished - 2 Jul 2021
Event2021 IEEE 4th International Conference on Big Data and Artificial Intelligence, BDAI 2021 - Qingdao, China
Duration: 2 Jul 20214 Jul 2021

Publication series

Name2021 IEEE 4th International Conference on Big Data and Artificial Intelligence, BDAI 2021

Conference

Conference2021 IEEE 4th International Conference on Big Data and Artificial Intelligence, BDAI 2021
Country/TerritoryChina
CityQingdao
Period2/07/214/07/21

Keywords

  • data mining
  • hybrid sampling
  • imbalanced dataset
  • smote

Cite this