Classification and Identification of Phishing Websites based on Machine Learning

Sheng Fang; Tianyang Liu; Yaning Zhu; Wenjun Fan

doi:10.1109/CyberC58899.2023.00068

Classification and Identification of Phishing Websites based on Machine Learning

Sheng Fang, Tianyang Liu, Yaning Zhu, Wenjun Fan^*

^*Corresponding author for this work

School of Advanced Technology

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

Abstract

Phishing is the largest network security issue among global cybercrimes in 2022. Its frequency of occurrence has maintained rapid growth and has become one of the most important network security issues. In the state of the art of this research field, there was a trade-off between high-precision discriminant models and huge consumption of computing resources. Therefore, the research purpose of this article is mainly to balance the relationship between accuracy and computing resources (performance) to achieve accuracy and computing efficiency at the same time. This article uses principal component analysis (PCA) as a tool, uses its excellent dimensionality reduction ability to process sample data, compresses the original feature set, and then uses different machine learning models to conduct experiments. In the end, the random forest model after PCA achieved a discrimination accuracy of 97.157% with a performance improvement of 25.1%, effectively achieving a win-win balance between accuracy and performance.

Original language	English
Title of host publication	Proceedings - 2023 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	397-403
Number of pages	7
ISBN (Electronic)	9798350308693
DOIs	https://doi.org/10.1109/CyberC58899.2023.00068
Publication status	Published - 2023
Event	15th International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023 - Jiangsu, China Duration: 2 Nov 2023 → 4 Nov 2023

Publication series

Name	Proceedings - 2023 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023

Conference

Conference	15th International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023
Country/Territory	China
City	Jiangsu
Period	2/11/23 → 4/11/23

Keywords

Cyber Crime
Machine Learning
PCA
Phishing Detection
Random Forest

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1109/CyberC58899.2023.00068

Cite this

Fang, S., Liu, T., Zhu, Y., & Fan, W. (2023). Classification and Identification of Phishing Websites based on Machine Learning. In Proceedings - 2023 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023 (pp. 397-403). (Proceedings - 2023 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CyberC58899.2023.00068

Fang, Sheng ; Liu, Tianyang ; Zhu, Yaning et al. / Classification and Identification of Phishing Websites based on Machine Learning. Proceedings - 2023 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023. Institute of Electrical and Electronics Engineers Inc., 2023. pp. 397-403 (Proceedings - 2023 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023).

@inproceedings{ad0fb690dc0043dfa3ff7c6071294f27,

title = "Classification and Identification of Phishing Websites based on Machine Learning",

abstract = "Phishing is the largest network security issue among global cybercrimes in 2022. Its frequency of occurrence has maintained rapid growth and has become one of the most important network security issues. In the state of the art of this research field, there was a trade-off between high-precision discriminant models and huge consumption of computing resources. Therefore, the research purpose of this article is mainly to balance the relationship between accuracy and computing resources (performance) to achieve accuracy and computing efficiency at the same time. This article uses principal component analysis (PCA) as a tool, uses its excellent dimensionality reduction ability to process sample data, compresses the original feature set, and then uses different machine learning models to conduct experiments. In the end, the random forest model after PCA achieved a discrimination accuracy of 97.157% with a performance improvement of 25.1%, effectively achieving a win-win balance between accuracy and performance.",

keywords = "Cyber Crime, Machine Learning, PCA, Phishing Detection, Random Forest",

author = "Sheng Fang and Tianyang Liu and Yaning Zhu and Wenjun Fan",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 15th International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023 ; Conference date: 02-11-2023 Through 04-11-2023",

year = "2023",

doi = "10.1109/CyberC58899.2023.00068",

language = "English",

series = "Proceedings - 2023 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "397--403",

booktitle = "Proceedings - 2023 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023",

}

Fang, S, Liu, T, Zhu, Y & Fan, W 2023, Classification and Identification of Phishing Websites based on Machine Learning. in Proceedings - 2023 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023. Proceedings - 2023 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023, Institute of Electrical and Electronics Engineers Inc., pp. 397-403, 15th International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023, Jiangsu, China, 2/11/23. https://doi.org/10.1109/CyberC58899.2023.00068

Classification and Identification of Phishing Websites based on Machine Learning. / Fang, Sheng; Liu, Tianyang; Zhu, Yaning et al.
Proceedings - 2023 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023. Institute of Electrical and Electronics Engineers Inc., 2023. p. 397-403 (Proceedings - 2023 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Classification and Identification of Phishing Websites based on Machine Learning

AU - Fang, Sheng

AU - Liu, Tianyang

AU - Zhu, Yaning

AU - Fan, Wenjun

PY - 2023

Y1 - 2023

N2 - Phishing is the largest network security issue among global cybercrimes in 2022. Its frequency of occurrence has maintained rapid growth and has become one of the most important network security issues. In the state of the art of this research field, there was a trade-off between high-precision discriminant models and huge consumption of computing resources. Therefore, the research purpose of this article is mainly to balance the relationship between accuracy and computing resources (performance) to achieve accuracy and computing efficiency at the same time. This article uses principal component analysis (PCA) as a tool, uses its excellent dimensionality reduction ability to process sample data, compresses the original feature set, and then uses different machine learning models to conduct experiments. In the end, the random forest model after PCA achieved a discrimination accuracy of 97.157% with a performance improvement of 25.1%, effectively achieving a win-win balance between accuracy and performance.

AB - Phishing is the largest network security issue among global cybercrimes in 2022. Its frequency of occurrence has maintained rapid growth and has become one of the most important network security issues. In the state of the art of this research field, there was a trade-off between high-precision discriminant models and huge consumption of computing resources. Therefore, the research purpose of this article is mainly to balance the relationship between accuracy and computing resources (performance) to achieve accuracy and computing efficiency at the same time. This article uses principal component analysis (PCA) as a tool, uses its excellent dimensionality reduction ability to process sample data, compresses the original feature set, and then uses different machine learning models to conduct experiments. In the end, the random forest model after PCA achieved a discrimination accuracy of 97.157% with a performance improvement of 25.1%, effectively achieving a win-win balance between accuracy and performance.

KW - Cyber Crime

KW - Machine Learning

KW - PCA

KW - Phishing Detection

KW - Random Forest

UR - http://www.scopus.com/inward/record.url?scp=85186756050&partnerID=8YFLogxK

U2 - 10.1109/CyberC58899.2023.00068

DO - 10.1109/CyberC58899.2023.00068

M3 - Conference Proceeding

AN - SCOPUS:85186756050

T3 - Proceedings - 2023 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023

SP - 397

EP - 403

BT - Proceedings - 2023 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 15th International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023

Y2 - 2 November 2023 through 4 November 2023

ER -

Fang S, Liu T, Zhu Y, Fan W. Classification and Identification of Phishing Websites based on Machine Learning. In Proceedings - 2023 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023. Institute of Electrical and Electronics Engineers Inc. 2023. p. 397-403. (Proceedings - 2023 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023). doi: 10.1109/CyberC58899.2023.00068

Classification and Identification of Phishing Websites based on Machine Learning

Abstract

Publication series

Conference

Keywords

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this