Spam detection via feature selection and decision tree

Yudong Zhang*, Shuihua Wang, Lenan Wu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

18 Citations (Scopus)

Abstract

In order to construct a spam detector with emphasis of reducing the error of mislabeling nonspams as spams, a hybrid spam detection system was proposed. We use wrapper-based feature selection method to extract important features. F-measure is set as the objective function because it combines both recall and precision indicators. Particle swarm optimization (PSO) is used to accelerate the search procedures. The C4.5 decision tree was employed due to its excellent classification capability. K-fold cross validation was used to enhance the generality. The results on 5600 emails data set demonstrate that the error of misclassifying nonspams as spams is only 1%, better than traditional method.

Original languageEnglish
Pages (from-to)726-730
Number of pages5
JournalAdvanced Science Letters
Volume5
Issue number2
DOIs
Publication statusPublished - 2012
Externally publishedYes

Keywords

  • C4.5 algorithm
  • Feature selection
  • K-fold cross validation
  • Particle swarm optimization
  • Spam detection
  • Wrapper

Fingerprint

Dive into the research topics of 'Spam detection via feature selection and decision tree'. Together they form a unique fingerprint.

Cite this