TY - JOUR
T1 - An efficient multistage phishing website detection model based on the CASE feature framework
T2 - Aiming at the real web environment
AU - Liu, Dong Jie
AU - Geng, Guang Gang
AU - Jin, Xiao Bo
AU - Wang, Wei
N1 - Publisher Copyright:
© 2021
PY - 2021/11
Y1 - 2021/11
N2 - Phishing has become a favorite method of hackers for committing data theft and continues to evolve. As long as phishing websites continue to operate, many more people and companies will suffer privacy leaks or financial losses. Therefore, the demand for fast and accurate phishing website detection grows stronger. However, the existing phishing detection methods do not fully analyze the features of phishing, and the performance and efficiency of the models only apply to certain limited datasets and need to be improved to be applied to the real web environment. This paper fully considers the social engineering principles of phishing, proposes a comprehensive and interpretable CASE feature framework and designs a multistage phishing detection model to effectively detect phishing sites, especially in the real web environment, where high efficiency and performance and extremely low false alarm rates are required. To fully verify the proposed method, two kinds of data experiments were carried out. One was the comparative experiments among different features and different detection models on CASE, which covers both classic machine learning and deep learning algorithms based on a constructed complex dataset. The other was a one-year phishing discovery experiment in the real web environment. The proposed method achieves better detection results under the premise of significantly shortening the execution time and works well in real phishing discovery, which proves its high practicability in reality.
AB - Phishing has become a favorite method of hackers for committing data theft and continues to evolve. As long as phishing websites continue to operate, many more people and companies will suffer privacy leaks or financial losses. Therefore, the demand for fast and accurate phishing website detection grows stronger. However, the existing phishing detection methods do not fully analyze the features of phishing, and the performance and efficiency of the models only apply to certain limited datasets and need to be improved to be applied to the real web environment. This paper fully considers the social engineering principles of phishing, proposes a comprehensive and interpretable CASE feature framework and designs a multistage phishing detection model to effectively detect phishing sites, especially in the real web environment, where high efficiency and performance and extremely low false alarm rates are required. To fully verify the proposed method, two kinds of data experiments were carried out. One was the comparative experiments among different features and different detection models on CASE, which covers both classic machine learning and deep learning algorithms based on a constructed complex dataset. The other was a one-year phishing discovery experiment in the real web environment. The proposed method achieves better detection results under the premise of significantly shortening the execution time and works well in real phishing discovery, which proves its high practicability in reality.
KW - CASE feature framework
KW - Machine learning
KW - Multistage model
KW - Phishing detection
KW - Real web environment
UR - http://www.scopus.com/inward/record.url?scp=85112018738&partnerID=8YFLogxK
U2 - 10.1016/j.cose.2021.102421
DO - 10.1016/j.cose.2021.102421
M3 - Article
AN - SCOPUS:85112018738
SN - 0167-4048
VL - 110
JO - Computers and Security
JF - Computers and Security
M1 - 102421
ER -