Covid-19 Tweets Analysis with Topic Modeling

Shichao Jia, Qi Chen, Wei Wang

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review


Social media has become an important data resource for knowledge discovery and data mining in multiple disciplines. With the exploding amount of social media data, how to efficiently and effectively exploit values and insights from such overwhelming amount of data has become an emerging area. Recently, various natural language processing techniques have been developed, e.g., word embedding, deep neural network and Latent Dirichlet Allocation (LDA), for studies such as sentiment analysis, traffic event detection, nature disaster assessment and COVID-19 tweet analysis. In this paper, topic modeling through LDA was used to conduct text mining on a large real-world COVID-19 tweet dataset, which contains more than 524 million multilingual tweets and covers 218 countries over a period of 3 months. We conducted extensive experiments and visualise insights discovered through this unsupervised process.

Original languageEnglish
Title of host publicationICCBD 2021 - 2021 4th International Conference on Computing and Big Data
PublisherAssociation for Computing Machinery
Number of pages7
ISBN (Electronic)9781450387194
Publication statusPublished - 27 Nov 2021
Event4th International Conference on Computing and Big Data, ICCBD 2021 - Virtual, Online, China
Duration: 27 Nov 202129 Nov 2021

Publication series

NameACM International Conference Proceeding Series


Conference4th International Conference on Computing and Big Data, ICCBD 2021
CityVirtual, Online


  • Covid-19
  • LDA
  • Social media analysis


Dive into the research topics of 'Covid-19 Tweets Analysis with Topic Modeling'. Together they form a unique fingerprint.

Cite this