Boosting the performance of web spam detection with ensemble under-sampling classification

Guang Gang Geng*, Chun Heng Wang, Qiu Dan Li, Lei Xu, Xiao Bo Jin

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

39 Citations (Scopus)

Abstract

Anti-spam has become one of the top challenges for the Web search. In this paper, we explore the web spam detection as a binary classification problem. Based on the fact that reputable pages are more easy to be obtained than spam ones on the Web, an ensemble under-sampling classification strategy is adopted, which exploits the information involved in the large number of reputable websites to full advantage. The strategy is based on the predicted spamicity of every sub-classifiers, in which both content-based and link-based features are taken into account. The experiments on standard WEBSPAM-UK2006 benchmark showed that the ensemble strategy can improve the web spam detection performance effectively.

Original languageEnglish
Title of host publicationProceedings - Fourth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2007
Pages583-587
Number of pages5
DOIs
Publication statusPublished - 2007
Externally publishedYes
Event4th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2007 - Haikou, China
Duration: 24 Aug 200727 Aug 2007

Publication series

NameProceedings - Fourth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2007
Volume4

Conference

Conference4th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2007
Country/TerritoryChina
CityHaikou
Period24/08/0727/08/07

Cite this