Mixture copulas with discrete margins and their application to imbalanced data

Yujian Liu; Dejun Xie; David A. Edwards; Siyi Yu

doi:10.1007/s42952-023-00226-3

Mixture copulas with discrete margins and their application to imbalanced data

Yujian Liu, Dejun Xie^*, David A. Edwards, Siyi Yu

^*Corresponding author for this work

School of Mathematics and Physics

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

This article introduces the approach of using Bayesian sampling to estimate the mixture copula with discrete margins, we further apply our models to solve the class imbalanced problems in data science by oversampling. The methodology makes it possible to learn and sample from the data set with the discrete and continuous features exists simultaneously. On the other hand, the discreetness of factors in a data set are not naturally considered for the classic SMOTE algorithm and classic random oversampling is simply performed by generating the already existing points, which do not give any new information to the classifiers and is easy to overfit. Copula methods enable us to generate new points with the correlation structure memorized by learning from the training set. Hence, the overfitting problems are reduced. Experiments with synthetic and real data are done in the article following the introduction of the methodology. The outcomes shows the validity of the approach when compared with the benchmark methods.

Original language	English
Pages (from-to)	878-900
Number of pages	23
Journal	Journal of the Korean Statistical Society
Volume	52
Issue number	4
DOIs	https://doi.org/10.1007/s42952-023-00226-3
Publication status	Published - Dec 2023

Keywords

Bayesian analysis
Copula methods
Dependence analysis
Imbalanced learning
Oversampling

Access to Document

10.1007/s42952-023-00226-3

Cite this

@article{64dab80b2c8e4710ab114399708c15af,

title = "Mixture copulas with discrete margins and their application to imbalanced data",

abstract = "This article introduces the approach of using Bayesian sampling to estimate the mixture copula with discrete margins, we further apply our models to solve the class imbalanced problems in data science by oversampling. The methodology makes it possible to learn and sample from the data set with the discrete and continuous features exists simultaneously. On the other hand, the discreetness of factors in a data set are not naturally considered for the classic SMOTE algorithm and classic random oversampling is simply performed by generating the already existing points, which do not give any new information to the classifiers and is easy to overfit. Copula methods enable us to generate new points with the correlation structure memorized by learning from the training set. Hence, the overfitting problems are reduced. Experiments with synthetic and real data are done in the article following the introduction of the methodology. The outcomes shows the validity of the approach when compared with the benchmark methods.",

keywords = "Bayesian analysis, Copula methods, Dependence analysis, Imbalanced learning, Oversampling",

author = "Yujian Liu and Dejun Xie and Edwards, {David A.} and Siyi Yu",

note = "Publisher Copyright: {\textcopyright} 2023, Korean Statistical Society.",

year = "2023",

month = dec,

doi = "10.1007/s42952-023-00226-3",

language = "English",

volume = "52",

pages = "878--900",

journal = "Journal of the Korean Statistical Society",

issn = "1226-3192",

number = "4",

}

TY - JOUR

T1 - Mixture copulas with discrete margins and their application to imbalanced data

AU - Liu, Yujian

AU - Xie, Dejun

AU - Edwards, David A.

AU - Yu, Siyi

PY - 2023/12

Y1 - 2023/12

N2 - This article introduces the approach of using Bayesian sampling to estimate the mixture copula with discrete margins, we further apply our models to solve the class imbalanced problems in data science by oversampling. The methodology makes it possible to learn and sample from the data set with the discrete and continuous features exists simultaneously. On the other hand, the discreetness of factors in a data set are not naturally considered for the classic SMOTE algorithm and classic random oversampling is simply performed by generating the already existing points, which do not give any new information to the classifiers and is easy to overfit. Copula methods enable us to generate new points with the correlation structure memorized by learning from the training set. Hence, the overfitting problems are reduced. Experiments with synthetic and real data are done in the article following the introduction of the methodology. The outcomes shows the validity of the approach when compared with the benchmark methods.

AB - This article introduces the approach of using Bayesian sampling to estimate the mixture copula with discrete margins, we further apply our models to solve the class imbalanced problems in data science by oversampling. The methodology makes it possible to learn and sample from the data set with the discrete and continuous features exists simultaneously. On the other hand, the discreetness of factors in a data set are not naturally considered for the classic SMOTE algorithm and classic random oversampling is simply performed by generating the already existing points, which do not give any new information to the classifiers and is easy to overfit. Copula methods enable us to generate new points with the correlation structure memorized by learning from the training set. Hence, the overfitting problems are reduced. Experiments with synthetic and real data are done in the article following the introduction of the methodology. The outcomes shows the validity of the approach when compared with the benchmark methods.

KW - Bayesian analysis

KW - Copula methods

KW - Dependence analysis

KW - Imbalanced learning

KW - Oversampling

UR - http://www.scopus.com/inward/record.url?scp=85170255171&partnerID=8YFLogxK

U2 - 10.1007/s42952-023-00226-3

DO - 10.1007/s42952-023-00226-3

M3 - Article

AN - SCOPUS:85170255171

SN - 1226-3192

VL - 52

SP - 878

EP - 900

JO - Journal of the Korean Statistical Society

JF - Journal of the Korean Statistical Society

IS - 4

ER -

Mixture copulas with discrete margins and their application to imbalanced data

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this