An efficient multistage phishing website detection model based on the CASE feature framework: Aiming at the real web environment

Dong Jie Liu; Guang Gang Geng; Xiao Bo Jin; Wei Wang

doi:10.1016/j.cose.2021.102421

An efficient multistage phishing website detection model based on the CASE feature framework: Aiming at the real web environment

Dong Jie Liu, Guang Gang Geng^*, Xiao Bo Jin, Wei Wang

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

35 Citations (Scopus)

Abstract

Phishing has become a favorite method of hackers for committing data theft and continues to evolve. As long as phishing websites continue to operate, many more people and companies will suffer privacy leaks or financial losses. Therefore, the demand for fast and accurate phishing website detection grows stronger. However, the existing phishing detection methods do not fully analyze the features of phishing, and the performance and efficiency of the models only apply to certain limited datasets and need to be improved to be applied to the real web environment. This paper fully considers the social engineering principles of phishing, proposes a comprehensive and interpretable CASE feature framework and designs a multistage phishing detection model to effectively detect phishing sites, especially in the real web environment, where high efficiency and performance and extremely low false alarm rates are required. To fully verify the proposed method, two kinds of data experiments were carried out. One was the comparative experiments among different features and different detection models on CASE, which covers both classic machine learning and deep learning algorithms based on a constructed complex dataset. The other was a one-year phishing discovery experiment in the real web environment. The proposed method achieves better detection results under the premise of significantly shortening the execution time and works well in real phishing discovery, which proves its high practicability in reality.

Original language	English
Article number	102421
Journal	Computers and Security
Volume	110
DOIs	https://doi.org/10.1016/j.cose.2021.102421
Publication status	Published - Nov 2021

Keywords

CASE feature framework
Machine learning
Multistage model
Phishing detection
Real web environment

Access to Document

10.1016/j.cose.2021.102421

Cite this

@article{2b93b10c29e446ab894a84bfaf77505f,

title = "An efficient multistage phishing website detection model based on the CASE feature framework: Aiming at the real web environment",

abstract = "Phishing has become a favorite method of hackers for committing data theft and continues to evolve. As long as phishing websites continue to operate, many more people and companies will suffer privacy leaks or financial losses. Therefore, the demand for fast and accurate phishing website detection grows stronger. However, the existing phishing detection methods do not fully analyze the features of phishing, and the performance and efficiency of the models only apply to certain limited datasets and need to be improved to be applied to the real web environment. This paper fully considers the social engineering principles of phishing, proposes a comprehensive and interpretable CASE feature framework and designs a multistage phishing detection model to effectively detect phishing sites, especially in the real web environment, where high efficiency and performance and extremely low false alarm rates are required. To fully verify the proposed method, two kinds of data experiments were carried out. One was the comparative experiments among different features and different detection models on CASE, which covers both classic machine learning and deep learning algorithms based on a constructed complex dataset. The other was a one-year phishing discovery experiment in the real web environment. The proposed method achieves better detection results under the premise of significantly shortening the execution time and works well in real phishing discovery, which proves its high practicability in reality.",

keywords = "CASE feature framework, Machine learning, Multistage model, Phishing detection, Real web environment",

author = "Liu, {Dong Jie} and Geng, {Guang Gang} and Jin, {Xiao Bo} and Wei Wang",

note = "Publisher Copyright: {\textcopyright} 2021",

year = "2021",

month = nov,

doi = "10.1016/j.cose.2021.102421",

language = "English",

volume = "110",

journal = "Computers and Security",

issn = "0167-4048",

}

TY - JOUR

T1 - An efficient multistage phishing website detection model based on the CASE feature framework

T2 - Aiming at the real web environment

AU - Liu, Dong Jie

AU - Geng, Guang Gang

AU - Jin, Xiao Bo

AU - Wang, Wei

PY - 2021/11

Y1 - 2021/11

N2 - Phishing has become a favorite method of hackers for committing data theft and continues to evolve. As long as phishing websites continue to operate, many more people and companies will suffer privacy leaks or financial losses. Therefore, the demand for fast and accurate phishing website detection grows stronger. However, the existing phishing detection methods do not fully analyze the features of phishing, and the performance and efficiency of the models only apply to certain limited datasets and need to be improved to be applied to the real web environment. This paper fully considers the social engineering principles of phishing, proposes a comprehensive and interpretable CASE feature framework and designs a multistage phishing detection model to effectively detect phishing sites, especially in the real web environment, where high efficiency and performance and extremely low false alarm rates are required. To fully verify the proposed method, two kinds of data experiments were carried out. One was the comparative experiments among different features and different detection models on CASE, which covers both classic machine learning and deep learning algorithms based on a constructed complex dataset. The other was a one-year phishing discovery experiment in the real web environment. The proposed method achieves better detection results under the premise of significantly shortening the execution time and works well in real phishing discovery, which proves its high practicability in reality.

AB - Phishing has become a favorite method of hackers for committing data theft and continues to evolve. As long as phishing websites continue to operate, many more people and companies will suffer privacy leaks or financial losses. Therefore, the demand for fast and accurate phishing website detection grows stronger. However, the existing phishing detection methods do not fully analyze the features of phishing, and the performance and efficiency of the models only apply to certain limited datasets and need to be improved to be applied to the real web environment. This paper fully considers the social engineering principles of phishing, proposes a comprehensive and interpretable CASE feature framework and designs a multistage phishing detection model to effectively detect phishing sites, especially in the real web environment, where high efficiency and performance and extremely low false alarm rates are required. To fully verify the proposed method, two kinds of data experiments were carried out. One was the comparative experiments among different features and different detection models on CASE, which covers both classic machine learning and deep learning algorithms based on a constructed complex dataset. The other was a one-year phishing discovery experiment in the real web environment. The proposed method achieves better detection results under the premise of significantly shortening the execution time and works well in real phishing discovery, which proves its high practicability in reality.

KW - CASE feature framework

KW - Machine learning

KW - Multistage model

KW - Phishing detection

KW - Real web environment

UR - http://www.scopus.com/inward/record.url?scp=85112018738&partnerID=8YFLogxK

U2 - 10.1016/j.cose.2021.102421

DO - 10.1016/j.cose.2021.102421

M3 - Article

AN - SCOPUS:85112018738

SN - 0167-4048

VL - 110

JO - Computers and Security

JF - Computers and Security

M1 - 102421

ER -

An efficient multistage phishing website detection model based on the CASE feature framework: Aiming at the real web environment

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this