Mixture copulas with discrete margins and their application to imbalanced data

Yujian Liu, Dejun Xie*, David A. Edwards, Siyi Yu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

This article introduces the approach of using Bayesian sampling to estimate the mixture copula with discrete margins, we further apply our models to solve the class imbalanced problems in data science by oversampling. The methodology makes it possible to learn and sample from the data set with the discrete and continuous features exists simultaneously. On the other hand, the discreetness of factors in a data set are not naturally considered for the classic SMOTE algorithm and classic random oversampling is simply performed by generating the already existing points, which do not give any new information to the classifiers and is easy to overfit. Copula methods enable us to generate new points with the correlation structure memorized by learning from the training set. Hence, the overfitting problems are reduced. Experiments with synthetic and real data are done in the article following the introduction of the methodology. The outcomes shows the validity of the approach when compared with the benchmark methods.

Original languageEnglish
Pages (from-to)878-900
Number of pages23
JournalJournal of the Korean Statistical Society
Volume52
Issue number4
DOIs
Publication statusPublished - Dec 2023

Keywords

  • Bayesian analysis
  • Copula methods
  • Dependence analysis
  • Imbalanced learning
  • Oversampling

Fingerprint

Dive into the research topics of 'Mixture copulas with discrete margins and their application to imbalanced data'. Together they form a unique fingerprint.

Cite this