Abstract
In order to construct a spam detector with emphasis of reducing the error of mislabeling nonspams as spams, a hybrid spam detection system was proposed. We use wrapper-based feature selection method to extract important features. F-measure is set as the objective function because it combines both recall and precision indicators. Particle swarm optimization (PSO) is used to accelerate the search procedures. The C4.5 decision tree was employed due to its excellent classification capability. K-fold cross validation was used to enhance the generality. The results on 5600 emails data set demonstrate that the error of misclassifying nonspams as spams is only 1%, better than traditional method.
Original language | English |
---|---|
Pages (from-to) | 726-730 |
Number of pages | 5 |
Journal | Advanced Science Letters |
Volume | 5 |
Issue number | 2 |
DOIs | |
Publication status | Published - 2012 |
Externally published | Yes |
Keywords
- C4.5 algorithm
- Feature selection
- K-fold cross validation
- Particle swarm optimization
- Spam detection
- Wrapper