TY - GEN
T1 - VSB-DVM
T2 - 19th IEEE International Conference on Data Mining, ICDM 2019
AU - Yang, Xi
AU - Yan, Yuyao
AU - Huang, Kaizhu
AU - Zhang, Rui
N1 - Funding Information:
The work reported in this paper was partially supported by the following: National Natural Science Foundation of China (NSFC) under grant no.61473236; Natural Science Fund for Colleges and Universities in Jiangsu Province under grant no.17KJD520010; Suzhou Science and Technology Program under grant no.SY G201712, SZS201613 and Jiangsu University Natural Science Research Programme under grant no.17KJB520041, in part by the Key Program Special Fund in XJTLU (KSF - A - 01).
Funding Information:
The work reported in this paper was partially supported by the following: National Natural Science Foundation of China (NSFC) under grant no.61473236; Natural Science Fund for Colleges and Universities in Jiangsu Province under grant no.17KJD520010; Suzhou Science and Technology Program under grant no.SY G201712, SZS201613 and Jiangsu University Natural Science Research Programme under grant no.17KJB520041, in part by the Key Program Special Fund in XJTLU (KSF − A− 01).
Publisher Copyright:
© 2019 IEEE.
PY - 2019/11
Y1 - 2019/11
N2 - Mixture of factor analyzers is a fundamental model in unsupervised learning, which is particularly useful for high dimensional data. Recent efforts on deep auto-encoding mixture models made a fruitful progress in clustering. However, in most cases, their performance depends highly on the results of pre-training. Moreover, they tend to ignore the prior information when making clustering assignment, leading to a less strict inference and consequently limiting the performance. In this paper, we propose an end-to-end Bayesian nonparametric generalization of deep mixture model with a Variational Auto-Encoder (VAE) framework. Specifically, we develop a novel model called VSB-DVM exploiting the Variational Stick-Breaking Process to design a Deep Variational Mixture Model. Distinct from the existing deep auto-encoding mixture models, this novel unsupervised deep generative model can learn low-dimensional representations and clustering simultaneously without pre-training. Importantly, a strict inference is proposed using weights of stick-breaking process in a variational way. Furthermore, able to capture the richer statistical structure of the data, VSB-DVM can also generate highly realistic samples for any specified cluster. A series of experiments are carried out, both qualitatively and quantitatively, on benchmark clustering and generation tasks. Comparative results show that the proposed model is able to generate diverse and high-quality samples of data, and also achieves encouraging clustering results outperforming the state-of-the-art algorithms on four real-world datasets.
AB - Mixture of factor analyzers is a fundamental model in unsupervised learning, which is particularly useful for high dimensional data. Recent efforts on deep auto-encoding mixture models made a fruitful progress in clustering. However, in most cases, their performance depends highly on the results of pre-training. Moreover, they tend to ignore the prior information when making clustering assignment, leading to a less strict inference and consequently limiting the performance. In this paper, we propose an end-to-end Bayesian nonparametric generalization of deep mixture model with a Variational Auto-Encoder (VAE) framework. Specifically, we develop a novel model called VSB-DVM exploiting the Variational Stick-Breaking Process to design a Deep Variational Mixture Model. Distinct from the existing deep auto-encoding mixture models, this novel unsupervised deep generative model can learn low-dimensional representations and clustering simultaneously without pre-training. Importantly, a strict inference is proposed using weights of stick-breaking process in a variational way. Furthermore, able to capture the richer statistical structure of the data, VSB-DVM can also generate highly realistic samples for any specified cluster. A series of experiments are carried out, both qualitatively and quantitatively, on benchmark clustering and generation tasks. Comparative results show that the proposed model is able to generate diverse and high-quality samples of data, and also achieves encouraging clustering results outperforming the state-of-the-art algorithms on four real-world datasets.
KW - Deep Embedded Clustering
KW - Finite Mixture Model
KW - Stick-breaking Prior
KW - Variational Auto Encoder
UR - http://www.scopus.com/inward/record.url?scp=85078896507&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2019.00079
DO - 10.1109/ICDM.2019.00079
M3 - Conference Proceeding
AN - SCOPUS:85078896507
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 688
EP - 697
BT - Proceedings - 19th IEEE International Conference on Data Mining, ICDM 2019
A2 - Wang, Jianyong
A2 - Shim, Kyuseok
A2 - Wu, Xindong
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 8 November 2019 through 11 November 2019
ER -