Abstract
Microblog is becoming the main media to spread public information. Analyzing microblog data can contribute to timely knowing public information for researchers. Therefore, it is important to effectively collect microblog data. To solve the problems that the traditional web clawer could not inquire whole information and the API had lots of restrictions, a distributed crawler system was designed based on P2P for SINA microblog. The crawler was based on simulated login technology and assigned tasks according to user position information to efficiently collect data continuously. The comparison results with other structures show that the proposed system has good performance to provide adequate data.
Original language | English |
---|---|
Pages (from-to) | 296-301 |
Number of pages | 6 |
Journal | Jiangsu Daxue Xuebao (Ziran Kexue Ban) / Journal of Jiangsu University (Natural Science Edition) |
Volume | 37 |
Issue number | 3 |
DOIs | |
Publication status | Published - 1 May 2016 |
Externally published | Yes |
Keywords
- Distributed
- Microblog
- P2P
- Simulated login
- Web crawler