TY - GEN
T1 - Classification and Identification of Phishing Websites based on Machine Learning
AU - Fang, Sheng
AU - Liu, Tianyang
AU - Zhu, Yaning
AU - Fan, Wenjun
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Phishing is the largest network security issue among global cybercrimes in 2022. Its frequency of occurrence has maintained rapid growth and has become one of the most important network security issues. In the state of the art of this research field, there was a trade-off between high-precision discriminant models and huge consumption of computing resources. Therefore, the research purpose of this article is mainly to balance the relationship between accuracy and computing resources (performance) to achieve accuracy and computing efficiency at the same time. This article uses principal component analysis (PCA) as a tool, uses its excellent dimensionality reduction ability to process sample data, compresses the original feature set, and then uses different machine learning models to conduct experiments. In the end, the random forest model after PCA achieved a discrimination accuracy of 97.157% with a performance improvement of 25.1%, effectively achieving a win-win balance between accuracy and performance.
AB - Phishing is the largest network security issue among global cybercrimes in 2022. Its frequency of occurrence has maintained rapid growth and has become one of the most important network security issues. In the state of the art of this research field, there was a trade-off between high-precision discriminant models and huge consumption of computing resources. Therefore, the research purpose of this article is mainly to balance the relationship between accuracy and computing resources (performance) to achieve accuracy and computing efficiency at the same time. This article uses principal component analysis (PCA) as a tool, uses its excellent dimensionality reduction ability to process sample data, compresses the original feature set, and then uses different machine learning models to conduct experiments. In the end, the random forest model after PCA achieved a discrimination accuracy of 97.157% with a performance improvement of 25.1%, effectively achieving a win-win balance between accuracy and performance.
KW - Cyber Crime
KW - Machine Learning
KW - PCA
KW - Phishing Detection
KW - Random Forest
UR - http://www.scopus.com/inward/record.url?scp=85186756050&partnerID=8YFLogxK
U2 - 10.1109/CyberC58899.2023.00068
DO - 10.1109/CyberC58899.2023.00068
M3 - Conference Proceeding
AN - SCOPUS:85186756050
T3 - Proceedings - 2023 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023
SP - 397
EP - 403
BT - Proceedings - 2023 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 15th International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023
Y2 - 2 November 2023 through 4 November 2023
ER -