Boosting the phishing detection performance by semantic analysis

Xi Zhang, Yu Zeng, Xiao Bo Jin, Zhi Wei Yan, Guang Gang Geng

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

28 Citations (Scopus)

Abstract

Phishing is increasingly severe in recent years, which seriously threatens the privacy and property security of netizens. Phishing is essentially a counterfeiting of brands. In order to effectively cheat the victim, phishing sites are visually and semantically highly similar to real sites. In recent years, anti-phishing methods based on machine learning are mainstream anti-phishing methods. The effectiveness of the machine learning models hinges on the extracted statistical features. However, the extracted statistical features mainly focus on visual similarity, stealing information and third-party services, which ignore the semantic information of web pages. Therefore, we extract a series of semantic features through word2vec to better describe the features of phishing sites, and further fuse them with other multi-scale statistical features to construct a more robust phishing detection model. The experimental results on the actual data sets show that the majority of phishing websites are effectively identified by only mining the semantic features of word embeddings. The phishing detection models based on fusion features obtained the best detection results, which shows that semantic features and other statistical features have good complementarity. The proposed method provides a promising way for phishing detection in actual Internet environment, which boosts the phishing detection performance effectively.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
EditorsJian-Yun Nie, Zoran Obradovic, Toyotaro Suzumura, Rumi Ghosh, Raghunath Nambiar, Chonggang Wang, Hui Zang, Ricardo Baeza-Yates, Ricardo Baeza-Yates, Xiaohua Hu, Jeremy Kepner, Alfredo Cuzzocrea, Jian Tang, Masashi Toyoda
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1063-1070
Number of pages8
ISBN (Electronic)9781538627143
DOIs
Publication statusPublished - 1 Jul 2017
Externally publishedYes
Event5th IEEE International Conference on Big Data, Big Data 2017 - Boston, United States
Duration: 11 Dec 201714 Dec 2017

Publication series

NameProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
Volume2018-January

Conference

Conference5th IEEE International Conference on Big Data, Big Data 2017
Country/TerritoryUnited States
CityBoston
Period11/12/1714/12/17

Keywords

  • deep learning
  • phishing detection
  • semantic analysis
  • statistical feature
  • word embeddings

Fingerprint

Dive into the research topics of 'Boosting the phishing detection performance by semantic analysis'. Together they form a unique fingerprint.

Cite this