Distributed microblog crawler system based on P2P

Yang Lu; Huakang Li; Guozi Sun

doi:10.3969/j.issn.1671-7775.2016.03.008

Distributed microblog crawler system based on P2P

Yang Lu, Huakang Li, Guozi Sun

Nanjing University of Posts and Telecommunications

Research output: Contribution to journal › Article › peer-review

2 Citations (Scopus)

Abstract

Microblog is becoming the main media to spread public information. Analyzing microblog data can contribute to timely knowing public information for researchers. Therefore, it is important to effectively collect microblog data. To solve the problems that the traditional web clawer could not inquire whole information and the API had lots of restrictions, a distributed crawler system was designed based on P2P for SINA microblog. The crawler was based on simulated login technology and assigned tasks according to user position information to efficiently collect data continuously. The comparison results with other structures show that the proposed system has good performance to provide adequate data.

Original language	English
Pages (from-to)	296-301
Number of pages	6
Journal	Jiangsu Daxue Xuebao (Ziran Kexue Ban) / Journal of Jiangsu University (Natural Science Edition)
Volume	37
Issue number	3
DOIs	https://doi.org/10.3969/j.issn.1671-7775.2016.03.008
Publication status	Published - 1 May 2016
Externally published	Yes

Keywords

Distributed
Microblog
P2P
Simulated login
Web crawler

Access to Document

10.3969/j.issn.1671-7775.2016.03.008

Cite this

@article{8c0a4b39bf7a4c4193142aae811e249d,

title = "Distributed microblog crawler system based on P2P",

abstract = "Microblog is becoming the main media to spread public information. Analyzing microblog data can contribute to timely knowing public information for researchers. Therefore, it is important to effectively collect microblog data. To solve the problems that the traditional web clawer could not inquire whole information and the API had lots of restrictions, a distributed crawler system was designed based on P2P for SINA microblog. The crawler was based on simulated login technology and assigned tasks according to user position information to efficiently collect data continuously. The comparison results with other structures show that the proposed system has good performance to provide adequate data.",

keywords = "Distributed, Microblog, P2P, Simulated login, Web crawler",

author = "Yang Lu and Huakang Li and Guozi Sun",

year = "2016",

month = may,

day = "1",

doi = "10.3969/j.issn.1671-7775.2016.03.008",

language = "English",

volume = "37",

pages = "296--301",

journal = "Jiangsu Daxue Xuebao (Ziran Kexue Ban) / Journal of Jiangsu University (Natural Science Edition)",

issn = "1671-7775",

number = "3",

}

TY - JOUR

T1 - Distributed microblog crawler system based on P2P

AU - Lu, Yang

AU - Li, Huakang

AU - Sun, Guozi

PY - 2016/5/1

Y1 - 2016/5/1

N2 - Microblog is becoming the main media to spread public information. Analyzing microblog data can contribute to timely knowing public information for researchers. Therefore, it is important to effectively collect microblog data. To solve the problems that the traditional web clawer could not inquire whole information and the API had lots of restrictions, a distributed crawler system was designed based on P2P for SINA microblog. The crawler was based on simulated login technology and assigned tasks according to user position information to efficiently collect data continuously. The comparison results with other structures show that the proposed system has good performance to provide adequate data.

AB - Microblog is becoming the main media to spread public information. Analyzing microblog data can contribute to timely knowing public information for researchers. Therefore, it is important to effectively collect microblog data. To solve the problems that the traditional web clawer could not inquire whole information and the API had lots of restrictions, a distributed crawler system was designed based on P2P for SINA microblog. The crawler was based on simulated login technology and assigned tasks according to user position information to efficiently collect data continuously. The comparison results with other structures show that the proposed system has good performance to provide adequate data.

KW - Distributed

KW - Microblog

KW - P2P

KW - Simulated login

KW - Web crawler

UR - http://www.scopus.com/inward/record.url?scp=84967144602&partnerID=8YFLogxK

U2 - 10.3969/j.issn.1671-7775.2016.03.008

DO - 10.3969/j.issn.1671-7775.2016.03.008

M3 - Article

AN - SCOPUS:84967144602

SN - 1671-7775

VL - 37

SP - 296

EP - 301

JO - Jiangsu Daxue Xuebao (Ziran Kexue Ban) / Journal of Jiangsu University (Natural Science Edition)

JF - Jiangsu Daxue Xuebao (Ziran Kexue Ban) / Journal of Jiangsu University (Natural Science Edition)

IS - 3

ER -

Distributed microblog crawler system based on P2P

Abstract

Keywords

Access to Document

Other files and links

Cite this