TY - GEN
T1 - Boosting the performance of web spam detection with ensemble under-sampling classification
AU - Geng, Guang Gang
AU - Wang, Chun Heng
AU - Li, Qiu Dan
AU - Xu, Lei
AU - Jin, Xiao Bo
PY - 2007
Y1 - 2007
N2 - Anti-spam has become one of the top challenges for the Web search. In this paper, we explore the web spam detection as a binary classification problem. Based on the fact that reputable pages are more easy to be obtained than spam ones on the Web, an ensemble under-sampling classification strategy is adopted, which exploits the information involved in the large number of reputable websites to full advantage. The strategy is based on the predicted spamicity of every sub-classifiers, in which both content-based and link-based features are taken into account. The experiments on standard WEBSPAM-UK2006 benchmark showed that the ensemble strategy can improve the web spam detection performance effectively.
AB - Anti-spam has become one of the top challenges for the Web search. In this paper, we explore the web spam detection as a binary classification problem. Based on the fact that reputable pages are more easy to be obtained than spam ones on the Web, an ensemble under-sampling classification strategy is adopted, which exploits the information involved in the large number of reputable websites to full advantage. The strategy is based on the predicted spamicity of every sub-classifiers, in which both content-based and link-based features are taken into account. The experiments on standard WEBSPAM-UK2006 benchmark showed that the ensemble strategy can improve the web spam detection performance effectively.
UR - http://www.scopus.com/inward/record.url?scp=44049101329&partnerID=8YFLogxK
U2 - 10.1109/FSKD.2007.207
DO - 10.1109/FSKD.2007.207
M3 - Conference Proceeding
AN - SCOPUS:44049101329
SN - 0769528740
SN - 9780769528748
T3 - Proceedings - Fourth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2007
SP - 583
EP - 587
BT - Proceedings - Fourth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2007
T2 - 4th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2007
Y2 - 24 August 2007 through 27 August 2007
ER -