VSB-DVM: An end-to-end bayesian nonparametric generalization of deep variational mixture model

Xi Yang; Yuyao Yan; Kaizhu Huang; Rui Zhang

doi:10.1109/ICDM.2019.00079

VSB-DVM: An end-to-end bayesian nonparametric generalization of deep variational mixture model

Xi Yang, Yuyao Yan, Kaizhu Huang^*, Rui Zhang

^*Corresponding author for this work

University of Liverpool

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

4 Citations (Scopus)

Abstract

Mixture of factor analyzers is a fundamental model in unsupervised learning, which is particularly useful for high dimensional data. Recent efforts on deep auto-encoding mixture models made a fruitful progress in clustering. However, in most cases, their performance depends highly on the results of pre-training. Moreover, they tend to ignore the prior information when making clustering assignment, leading to a less strict inference and consequently limiting the performance. In this paper, we propose an end-to-end Bayesian nonparametric generalization of deep mixture model with a Variational Auto-Encoder (VAE) framework. Specifically, we develop a novel model called VSB-DVM exploiting the Variational Stick-Breaking Process to design a Deep Variational Mixture Model. Distinct from the existing deep auto-encoding mixture models, this novel unsupervised deep generative model can learn low-dimensional representations and clustering simultaneously without pre-training. Importantly, a strict inference is proposed using weights of stick-breaking process in a variational way. Furthermore, able to capture the richer statistical structure of the data, VSB-DVM can also generate highly realistic samples for any specified cluster. A series of experiments are carried out, both qualitatively and quantitatively, on benchmark clustering and generation tasks. Comparative results show that the proposed model is able to generate diverse and high-quality samples of data, and also achieves encouraging clustering results outperforming the state-of-the-art algorithms on four real-world datasets.

Original language	English
Title of host publication	Proceedings - 19th IEEE International Conference on Data Mining, ICDM 2019
Editors	Jianyong Wang, Kyuseok Shim, Xindong Wu
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	688-697
Number of pages	10
ISBN (Electronic)	9781728146034
DOIs	https://doi.org/10.1109/ICDM.2019.00079
Publication status	Published - Nov 2019
Event	19th IEEE International Conference on Data Mining, ICDM 2019 - Beijing, China Duration: 8 Nov 2019 → 11 Nov 2019

Publication series

Name	Proceedings - IEEE International Conference on Data Mining, ICDM
Volume	2019-November
ISSN (Print)	1550-4786

Conference

Conference	19th IEEE International Conference on Data Mining, ICDM 2019
Country/Territory	China
City	Beijing
Period	8/11/19 → 11/11/19

Keywords

Deep Embedded Clustering
Finite Mixture Model
Stick-breaking Prior
Variational Auto Encoder

Access to Document

10.1109/ICDM.2019.00079

Cite this

Yang, X., Yan, Y., Huang, K., & Zhang, R. (2019). VSB-DVM: An end-to-end bayesian nonparametric generalization of deep variational mixture model. In J. Wang, K. Shim, & X. Wu (Eds.), Proceedings - 19th IEEE International Conference on Data Mining, ICDM 2019 (pp. 688-697). Article 8970727 (Proceedings - IEEE International Conference on Data Mining, ICDM; Vol. 2019-November). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDM.2019.00079

Yang, Xi ; Yan, Yuyao ; Huang, Kaizhu et al. / VSB-DVM : An end-to-end bayesian nonparametric generalization of deep variational mixture model. Proceedings - 19th IEEE International Conference on Data Mining, ICDM 2019. editor / Jianyong Wang ; Kyuseok Shim ; Xindong Wu. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 688-697 (Proceedings - IEEE International Conference on Data Mining, ICDM).

@inproceedings{76f3b4163663416c8adde369afd3c358,

title = "VSB-DVM: An end-to-end bayesian nonparametric generalization of deep variational mixture model",

abstract = "Mixture of factor analyzers is a fundamental model in unsupervised learning, which is particularly useful for high dimensional data. Recent efforts on deep auto-encoding mixture models made a fruitful progress in clustering. However, in most cases, their performance depends highly on the results of pre-training. Moreover, they tend to ignore the prior information when making clustering assignment, leading to a less strict inference and consequently limiting the performance. In this paper, we propose an end-to-end Bayesian nonparametric generalization of deep mixture model with a Variational Auto-Encoder (VAE) framework. Specifically, we develop a novel model called VSB-DVM exploiting the Variational Stick-Breaking Process to design a Deep Variational Mixture Model. Distinct from the existing deep auto-encoding mixture models, this novel unsupervised deep generative model can learn low-dimensional representations and clustering simultaneously without pre-training. Importantly, a strict inference is proposed using weights of stick-breaking process in a variational way. Furthermore, able to capture the richer statistical structure of the data, VSB-DVM can also generate highly realistic samples for any specified cluster. A series of experiments are carried out, both qualitatively and quantitatively, on benchmark clustering and generation tasks. Comparative results show that the proposed model is able to generate diverse and high-quality samples of data, and also achieves encouraging clustering results outperforming the state-of-the-art algorithms on four real-world datasets.",

keywords = "Deep Embedded Clustering, Finite Mixture Model, Stick-breaking Prior, Variational Auto Encoder",

author = "Xi Yang and Yuyao Yan and Kaizhu Huang and Rui Zhang",

note = "Funding Information: The work reported in this paper was partially supported by the following: National Natural Science Foundation of China (NSFC) under grant no.61473236; Natural Science Fund for Colleges and Universities in Jiangsu Province under grant no.17KJD520010; Suzhou Science and Technology Program under grant no.SY G201712, SZS201613 and Jiangsu University Natural Science Research Programme under grant no.17KJB520041, in part by the Key Program Special Fund in XJTLU (KSF - A - 01). Funding Information: The work reported in this paper was partially supported by the following: National Natural Science Foundation of China (NSFC) under grant no.61473236; Natural Science Fund for Colleges and Universities in Jiangsu Province under grant no.17KJD520010; Suzhou Science and Technology Program under grant no.SY G201712, SZS201613 and Jiangsu University Natural Science Research Programme under grant no.17KJB520041, in part by the Key Program Special Fund in XJTLU (KSF − A− 01). Publisher Copyright: {\textcopyright} 2019 IEEE.; 19th IEEE International Conference on Data Mining, ICDM 2019 ; Conference date: 08-11-2019 Through 11-11-2019",

year = "2019",

month = nov,

doi = "10.1109/ICDM.2019.00079",

language = "English",

series = "Proceedings - IEEE International Conference on Data Mining, ICDM",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "688--697",

editor = "Jianyong Wang and Kyuseok Shim and Xindong Wu",

booktitle = "Proceedings - 19th IEEE International Conference on Data Mining, ICDM 2019",

}

Yang, X , Yan, Y, Huang, K & Zhang, R 2019, VSB-DVM: An end-to-end bayesian nonparametric generalization of deep variational mixture model. in J Wang, K Shim & X Wu (eds), Proceedings - 19th IEEE International Conference on Data Mining, ICDM 2019., 8970727, Proceedings - IEEE International Conference on Data Mining, ICDM, vol. 2019-November, Institute of Electrical and Electronics Engineers Inc., pp. 688-697, 19th IEEE International Conference on Data Mining, ICDM 2019, Beijing, China, 8/11/19. https://doi.org/10.1109/ICDM.2019.00079

VSB-DVM: An end-to-end bayesian nonparametric generalization of deep variational mixture model. / Yang, Xi ; Yan, Yuyao; Huang, Kaizhu et al.
Proceedings - 19th IEEE International Conference on Data Mining, ICDM 2019. ed. / Jianyong Wang; Kyuseok Shim; Xindong Wu. Institute of Electrical and Electronics Engineers Inc., 2019. p. 688-697 8970727 (Proceedings - IEEE International Conference on Data Mining, ICDM; Vol. 2019-November).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - VSB-DVM

T2 - 19th IEEE International Conference on Data Mining, ICDM 2019

AU - Yang, Xi

AU - Yan, Yuyao

AU - Huang, Kaizhu

AU - Zhang, Rui

N1 - Funding Information: The work reported in this paper was partially supported by the following: National Natural Science Foundation of China (NSFC) under grant no.61473236; Natural Science Fund for Colleges and Universities in Jiangsu Province under grant no.17KJD520010; Suzhou Science and Technology Program under grant no.SY G201712, SZS201613 and Jiangsu University Natural Science Research Programme under grant no.17KJB520041, in part by the Key Program Special Fund in XJTLU (KSF - A - 01). Funding Information: The work reported in this paper was partially supported by the following: National Natural Science Foundation of China (NSFC) under grant no.61473236; Natural Science Fund for Colleges and Universities in Jiangsu Province under grant no.17KJD520010; Suzhou Science and Technology Program under grant no.SY G201712, SZS201613 and Jiangsu University Natural Science Research Programme under grant no.17KJB520041, in part by the Key Program Special Fund in XJTLU (KSF − A− 01). Publisher Copyright: © 2019 IEEE.

PY - 2019/11

Y1 - 2019/11

N2 - Mixture of factor analyzers is a fundamental model in unsupervised learning, which is particularly useful for high dimensional data. Recent efforts on deep auto-encoding mixture models made a fruitful progress in clustering. However, in most cases, their performance depends highly on the results of pre-training. Moreover, they tend to ignore the prior information when making clustering assignment, leading to a less strict inference and consequently limiting the performance. In this paper, we propose an end-to-end Bayesian nonparametric generalization of deep mixture model with a Variational Auto-Encoder (VAE) framework. Specifically, we develop a novel model called VSB-DVM exploiting the Variational Stick-Breaking Process to design a Deep Variational Mixture Model. Distinct from the existing deep auto-encoding mixture models, this novel unsupervised deep generative model can learn low-dimensional representations and clustering simultaneously without pre-training. Importantly, a strict inference is proposed using weights of stick-breaking process in a variational way. Furthermore, able to capture the richer statistical structure of the data, VSB-DVM can also generate highly realistic samples for any specified cluster. A series of experiments are carried out, both qualitatively and quantitatively, on benchmark clustering and generation tasks. Comparative results show that the proposed model is able to generate diverse and high-quality samples of data, and also achieves encouraging clustering results outperforming the state-of-the-art algorithms on four real-world datasets.

AB - Mixture of factor analyzers is a fundamental model in unsupervised learning, which is particularly useful for high dimensional data. Recent efforts on deep auto-encoding mixture models made a fruitful progress in clustering. However, in most cases, their performance depends highly on the results of pre-training. Moreover, they tend to ignore the prior information when making clustering assignment, leading to a less strict inference and consequently limiting the performance. In this paper, we propose an end-to-end Bayesian nonparametric generalization of deep mixture model with a Variational Auto-Encoder (VAE) framework. Specifically, we develop a novel model called VSB-DVM exploiting the Variational Stick-Breaking Process to design a Deep Variational Mixture Model. Distinct from the existing deep auto-encoding mixture models, this novel unsupervised deep generative model can learn low-dimensional representations and clustering simultaneously without pre-training. Importantly, a strict inference is proposed using weights of stick-breaking process in a variational way. Furthermore, able to capture the richer statistical structure of the data, VSB-DVM can also generate highly realistic samples for any specified cluster. A series of experiments are carried out, both qualitatively and quantitatively, on benchmark clustering and generation tasks. Comparative results show that the proposed model is able to generate diverse and high-quality samples of data, and also achieves encouraging clustering results outperforming the state-of-the-art algorithms on four real-world datasets.

KW - Deep Embedded Clustering

KW - Finite Mixture Model

KW - Stick-breaking Prior

KW - Variational Auto Encoder

UR - http://www.scopus.com/inward/record.url?scp=85078896507&partnerID=8YFLogxK

U2 - 10.1109/ICDM.2019.00079

DO - 10.1109/ICDM.2019.00079

M3 - Conference Proceeding

AN - SCOPUS:85078896507

T3 - Proceedings - IEEE International Conference on Data Mining, ICDM

SP - 688

EP - 697

BT - Proceedings - 19th IEEE International Conference on Data Mining, ICDM 2019

A2 - Wang, Jianyong

A2 - Shim, Kyuseok

A2 - Wu, Xindong

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 8 November 2019 through 11 November 2019

ER -

Yang X , Yan Y, Huang K, Zhang R. VSB-DVM: An end-to-end bayesian nonparametric generalization of deep variational mixture model. In Wang J, Shim K, Wu X, editors, Proceedings - 19th IEEE International Conference on Data Mining, ICDM 2019. Institute of Electrical and Electronics Engineers Inc. 2019. p. 688-697. 8970727. (Proceedings - IEEE International Conference on Data Mining, ICDM). doi: 10.1109/ICDM.2019.00079

VSB-DVM: An end-to-end bayesian nonparametric generalization of deep variational mixture model

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Cite this