Distributed microblog crawler system based on P2P

Yang Lu, Huakang Li, Guozi Sun

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

Microblog is becoming the main media to spread public information. Analyzing microblog data can contribute to timely knowing public information for researchers. Therefore, it is important to effectively collect microblog data. To solve the problems that the traditional web clawer could not inquire whole information and the API had lots of restrictions, a distributed crawler system was designed based on P2P for SINA microblog. The crawler was based on simulated login technology and assigned tasks according to user position information to efficiently collect data continuously. The comparison results with other structures show that the proposed system has good performance to provide adequate data.

Original languageEnglish
Pages (from-to)296-301
Number of pages6
JournalJiangsu Daxue Xuebao (Ziran Kexue Ban) / Journal of Jiangsu University (Natural Science Edition)
Volume37
Issue number3
DOIs
Publication statusPublished - 1 May 2016
Externally publishedYes

Keywords

  • Distributed
  • Microblog
  • P2P
  • Simulated login
  • Web crawler

Cite this