Abstract
This article introduces the approach of using Bayesian sampling to estimate the mixture copula with discrete margins, we further apply our models to solve the class imbalanced problems in data science by oversampling. The methodology makes it possible to learn and sample from the data set with the discrete and continuous features exists simultaneously. On the other hand, the discreetness of factors in a data set are not naturally considered for the classic SMOTE algorithm and classic random oversampling is simply performed by generating the already existing points, which do not give any new information to the classifiers and is easy to overfit. Copula methods enable us to generate new points with the correlation structure memorized by learning from the training set. Hence, the overfitting problems are reduced. Experiments with synthetic and real data are done in the article following the introduction of the methodology. The outcomes shows the validity of the approach when compared with the benchmark methods.
Original language | English |
---|---|
Pages (from-to) | 878-900 |
Number of pages | 23 |
Journal | Journal of the Korean Statistical Society |
Volume | 52 |
Issue number | 4 |
DOIs | |
Publication status | Published - Dec 2023 |
Keywords
- Bayesian analysis
- Copula methods
- Dependence analysis
- Imbalanced learning
- Oversampling