TY - GEN
T1 - Topic detection from microblog based on text clustering and topic model analysis
AU - Huang, Siqi
AU - Yang, Yitao
AU - Li, Huakang
AU - Sun, Guozi
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2015/8/3
Y1 - 2015/8/3
N2 - This paper raises a Microblog topic detection method based on text clustering and topic model analysis. It solves the problem that the traditional topic detection method is mainly applicable for traditional media text, which is not very effective in handling sparse Micro blog short texts. In consequence of the structural data of the Microblog, which exists rich inter-textual contextual information such as retweets, comments, user hash tag, embedded link URL, we first put forward a feature weight pre-processing method. We also use a clustering algorithm based on word vectors to enrich the feature information of the data. On this basis, we extend the conventional LDA (Latent Dirichlet allocation) topic model to extract the hot topics in the Micro blog data. Compared with the traditional methods, the method raised in this paper is much more effective in the collected text corpus in Sina Microblog.
AB - This paper raises a Microblog topic detection method based on text clustering and topic model analysis. It solves the problem that the traditional topic detection method is mainly applicable for traditional media text, which is not very effective in handling sparse Micro blog short texts. In consequence of the structural data of the Microblog, which exists rich inter-textual contextual information such as retweets, comments, user hash tag, embedded link URL, we first put forward a feature weight pre-processing method. We also use a clustering algorithm based on word vectors to enrich the feature information of the data. On this basis, we extend the conventional LDA (Latent Dirichlet allocation) topic model to extract the hot topics in the Micro blog data. Compared with the traditional methods, the method raised in this paper is much more effective in the collected text corpus in Sina Microblog.
KW - LDA
KW - Microblog
KW - text clustering
KW - topic detection
UR - http://www.scopus.com/inward/record.url?scp=84954432939&partnerID=8YFLogxK
U2 - 10.1109/APSCC.2014.18
DO - 10.1109/APSCC.2014.18
M3 - Conference Proceeding
AN - SCOPUS:84954432939
T3 - Proceedings - 2014 Asia-Pacific Services Computing Conference, APSCC 2014
SP - 88
EP - 92
BT - Proceedings - 2014 Asia-Pacific Services Computing Conference, APSCC 2014
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 8th Asia-Pacific Services Computing Conference, APSCC 2014
Y2 - 4 December 2014 through 6 December 2014
ER -