An Improved Unbalanced Data Classification Method Based on Hybrid Sampling Approach

Biru Xu; Wenjia Wang; Rui Yang; Qi Han

doi:10.1109/BDAI52447.2021.9515306

An Improved Unbalanced Data Classification Method Based on Hybrid Sampling Approach

Biru Xu, Wenjia Wang, Rui Yang^*, Qi Han

^*Corresponding author for this work

Department of Intelligent Science

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

6 Citations (Scopus)

Abstract

The problem of data imbalance has received far- reaching concerns since they could affect the accuracy of classification problem in the area of machine learning. As the minority class instances can be ignored by traditional classifiers, it is necessary to improve the recognition rate of minority instances. Therefore, the paper proposes a new hybrid sampling method to solve the data imbalance problem by enlarging the proportion of minority instances. For the oversampling part, a variant of SMOTE is provided combining methods of LR-SMOTE and CCR (Combined Cleaning and Resampling Algorithm); for the under-sampling part, the Tomek-link method is utilized to complete the task. After the pre-processing stage, the data set is classified by Random Forest (RF). Experimental results show that the novel algorithm effectively enhances the performance of RF on the data set with a higher accuracy.

Original language	English
Title of host publication	2021 IEEE 4th International Conference on Big Data and Artificial Intelligence, BDAI 2021
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	125-129
Number of pages	5
ISBN (Electronic)	9781665412704
DOIs	https://doi.org/10.1109/BDAI52447.2021.9515306
Publication status	Published - 2 Jul 2021
Event	2021 IEEE 4th International Conference on Big Data and Artificial Intelligence, BDAI 2021 - Qingdao, China Duration: 2 Jul 2021 → 4 Jul 2021

Publication series

Name	2021 IEEE 4th International Conference on Big Data and Artificial Intelligence, BDAI 2021

Conference

Conference	2021 IEEE 4th International Conference on Big Data and Artificial Intelligence, BDAI 2021
Country/Territory	China
City	Qingdao
Period	2/07/21 → 4/07/21

Keywords

data mining
hybrid sampling
imbalanced dataset
smote

Access to Document

10.1109/BDAI52447.2021.9515306

Cite this

Xu, B., Wang, W., Yang, R., & Han, Q. (2021). An Improved Unbalanced Data Classification Method Based on Hybrid Sampling Approach. In 2021 IEEE 4th International Conference on Big Data and Artificial Intelligence, BDAI 2021 (pp. 125-129). (2021 IEEE 4th International Conference on Big Data and Artificial Intelligence, BDAI 2021). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BDAI52447.2021.9515306

Xu, Biru ; Wang, Wenjia ; Yang, Rui et al. / An Improved Unbalanced Data Classification Method Based on Hybrid Sampling Approach. 2021 IEEE 4th International Conference on Big Data and Artificial Intelligence, BDAI 2021. Institute of Electrical and Electronics Engineers Inc., 2021. pp. 125-129 (2021 IEEE 4th International Conference on Big Data and Artificial Intelligence, BDAI 2021).

@inproceedings{a083e1ad20a749b5977113d105db2b9c,

title = "An Improved Unbalanced Data Classification Method Based on Hybrid Sampling Approach",

abstract = "The problem of data imbalance has received far- reaching concerns since they could affect the accuracy of classification problem in the area of machine learning. As the minority class instances can be ignored by traditional classifiers, it is necessary to improve the recognition rate of minority instances. Therefore, the paper proposes a new hybrid sampling method to solve the data imbalance problem by enlarging the proportion of minority instances. For the oversampling part, a variant of SMOTE is provided combining methods of LR-SMOTE and CCR (Combined Cleaning and Resampling Algorithm); for the under-sampling part, the Tomek-link method is utilized to complete the task. After the pre-processing stage, the data set is classified by Random Forest (RF). Experimental results show that the novel algorithm effectively enhances the performance of RF on the data set with a higher accuracy.",

keywords = "data mining, hybrid sampling, imbalanced dataset, smote",

author = "Biru Xu and Wenjia Wang and Rui Yang and Qi Han",

note = "Publisher Copyright: {\textcopyright} 2021 IEEE.; 2021 IEEE 4th International Conference on Big Data and Artificial Intelligence, BDAI 2021 ; Conference date: 02-07-2021 Through 04-07-2021",

year = "2021",

month = jul,

day = "2",

doi = "10.1109/BDAI52447.2021.9515306",

language = "English",

series = "2021 IEEE 4th International Conference on Big Data and Artificial Intelligence, BDAI 2021",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "125--129",

booktitle = "2021 IEEE 4th International Conference on Big Data and Artificial Intelligence, BDAI 2021",

}

Xu, B, Wang, W, Yang, R & Han, Q 2021, An Improved Unbalanced Data Classification Method Based on Hybrid Sampling Approach. in 2021 IEEE 4th International Conference on Big Data and Artificial Intelligence, BDAI 2021. 2021 IEEE 4th International Conference on Big Data and Artificial Intelligence, BDAI 2021, Institute of Electrical and Electronics Engineers Inc., pp. 125-129, 2021 IEEE 4th International Conference on Big Data and Artificial Intelligence, BDAI 2021, Qingdao, China, 2/07/21. https://doi.org/10.1109/BDAI52447.2021.9515306

An Improved Unbalanced Data Classification Method Based on Hybrid Sampling Approach. / Xu, Biru; Wang, Wenjia; Yang, Rui et al.
2021 IEEE 4th International Conference on Big Data and Artificial Intelligence, BDAI 2021. Institute of Electrical and Electronics Engineers Inc., 2021. p. 125-129 (2021 IEEE 4th International Conference on Big Data and Artificial Intelligence, BDAI 2021).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - An Improved Unbalanced Data Classification Method Based on Hybrid Sampling Approach

AU - Xu, Biru

AU - Wang, Wenjia

AU - Yang, Rui

AU - Han, Qi

PY - 2021/7/2

Y1 - 2021/7/2

N2 - The problem of data imbalance has received far- reaching concerns since they could affect the accuracy of classification problem in the area of machine learning. As the minority class instances can be ignored by traditional classifiers, it is necessary to improve the recognition rate of minority instances. Therefore, the paper proposes a new hybrid sampling method to solve the data imbalance problem by enlarging the proportion of minority instances. For the oversampling part, a variant of SMOTE is provided combining methods of LR-SMOTE and CCR (Combined Cleaning and Resampling Algorithm); for the under-sampling part, the Tomek-link method is utilized to complete the task. After the pre-processing stage, the data set is classified by Random Forest (RF). Experimental results show that the novel algorithm effectively enhances the performance of RF on the data set with a higher accuracy.

AB - The problem of data imbalance has received far- reaching concerns since they could affect the accuracy of classification problem in the area of machine learning. As the minority class instances can be ignored by traditional classifiers, it is necessary to improve the recognition rate of minority instances. Therefore, the paper proposes a new hybrid sampling method to solve the data imbalance problem by enlarging the proportion of minority instances. For the oversampling part, a variant of SMOTE is provided combining methods of LR-SMOTE and CCR (Combined Cleaning and Resampling Algorithm); for the under-sampling part, the Tomek-link method is utilized to complete the task. After the pre-processing stage, the data set is classified by Random Forest (RF). Experimental results show that the novel algorithm effectively enhances the performance of RF on the data set with a higher accuracy.

KW - data mining

KW - hybrid sampling

KW - imbalanced dataset

KW - smote

UR - http://www.scopus.com/inward/record.url?scp=85114507709&partnerID=8YFLogxK

U2 - 10.1109/BDAI52447.2021.9515306

DO - 10.1109/BDAI52447.2021.9515306

M3 - Conference Proceeding

AN - SCOPUS:85114507709

T3 - 2021 IEEE 4th International Conference on Big Data and Artificial Intelligence, BDAI 2021

SP - 125

EP - 129

BT - 2021 IEEE 4th International Conference on Big Data and Artificial Intelligence, BDAI 2021

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2021 IEEE 4th International Conference on Big Data and Artificial Intelligence, BDAI 2021

Y2 - 2 July 2021 through 4 July 2021

ER -

Xu B, Wang W, Yang R, Han Q. An Improved Unbalanced Data Classification Method Based on Hybrid Sampling Approach. In 2021 IEEE 4th International Conference on Big Data and Artificial Intelligence, BDAI 2021. Institute of Electrical and Electronics Engineers Inc. 2021. p. 125-129. (2021 IEEE 4th International Conference on Big Data and Artificial Intelligence, BDAI 2021). doi: 10.1109/BDAI52447.2021.9515306

An Improved Unbalanced Data Classification Method Based on Hybrid Sampling Approach

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Cite this