Topic detection from microblog based on text clustering and topic model analysis

Siqi Huang, Yitao Yang, Huakang Li, Guozi Sun*

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

8 Citations (Scopus)

Abstract

This paper raises a Microblog topic detection method based on text clustering and topic model analysis. It solves the problem that the traditional topic detection method is mainly applicable for traditional media text, which is not very effective in handling sparse Micro blog short texts. In consequence of the structural data of the Microblog, which exists rich inter-textual contextual information such as retweets, comments, user hash tag, embedded link URL, we first put forward a feature weight pre-processing method. We also use a clustering algorithm based on word vectors to enrich the feature information of the data. On this basis, we extend the conventional LDA (Latent Dirichlet allocation) topic model to extract the hot topics in the Micro blog data. Compared with the traditional methods, the method raised in this paper is much more effective in the collected text corpus in Sina Microblog.

Original languageEnglish
Title of host publicationProceedings - 2014 Asia-Pacific Services Computing Conference, APSCC 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages88-92
Number of pages5
ISBN (Electronic)9781479970643
DOIs
Publication statusPublished - 3 Aug 2015
Externally publishedYes
Event8th Asia-Pacific Services Computing Conference, APSCC 2014 - Fuzhou, Fu Jian, China
Duration: 4 Dec 20146 Dec 2014

Publication series

NameProceedings - 2014 Asia-Pacific Services Computing Conference, APSCC 2014

Conference

Conference8th Asia-Pacific Services Computing Conference, APSCC 2014
Country/TerritoryChina
CityFuzhou, Fu Jian
Period4/12/146/12/14

Keywords

  • LDA
  • Microblog
  • text clustering
  • topic detection

Cite this