Identification of phishing websites through hyperlink analysis and rule extraction

Chaoqun Wang; Zhongyi Hu; Raymond Chiong; Yukun Bao; Jiang Wu

doi:10.1108/EL-01-2020-0016

Identification of phishing websites through hyperlink analysis and rule extraction

Chaoqun Wang, Zhongyi Hu^*, Raymond Chiong, Yukun Bao, Jiang Wu

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

7 Citations (Scopus)

Abstract

Purpose: The aim of this study is to propose an efficient rule extraction and integration approach for identifying phishing websites. The proposed approach can elucidate patterns of phishing websites and identify them accurately. Design/methodology/approach: Hyperlink indicators along with URL-based features are used to build the identification model. In the proposed approach, very simple rules are first extracted based on individual features to provide meaningful and easy-to-understand rules. Then, the F-measure score is used to select high-quality rules for identifying phishing websites. To construct a reliable and promising phishing website identification model, the selected rules are integrated using a simple neural network model. Findings: Experiments conducted using self-collected and benchmark data sets show that the proposed approach outperforms 16 commonly used classifiers (including seven non–rule-based and four rule-based classifiers as well as five deep learning models) in terms of interpretability and identification performance. Originality/value: Investigating patterns of phishing websites based on hyperlink indicators using the efficient rule-based approach is innovative. It is not only helpful for identifying phishing websites, but also beneficial for extracting simple and understandable rules.

Original language	English
Pages (from-to)	1073-1093
Number of pages	21
Journal	Electronic Library
Volume	38
Issue number	5-6
DOIs	https://doi.org/10.1108/EL-01-2020-0016
Publication status	Published - 12 Dec 2020
Externally published	Yes

Keywords

Classification
Hyperlink analysis
Neural networks
Phishing websites
Rule extraction

Access to Document

10.1108/EL-01-2020-0016

Cite this

@article{7d03fe5d3b6b4f9db6ab65a9d04ab2e7,

title = "Identification of phishing websites through hyperlink analysis and rule extraction",

abstract = "Purpose: The aim of this study is to propose an efficient rule extraction and integration approach for identifying phishing websites. The proposed approach can elucidate patterns of phishing websites and identify them accurately. Design/methodology/approach: Hyperlink indicators along with URL-based features are used to build the identification model. In the proposed approach, very simple rules are first extracted based on individual features to provide meaningful and easy-to-understand rules. Then, the F-measure score is used to select high-quality rules for identifying phishing websites. To construct a reliable and promising phishing website identification model, the selected rules are integrated using a simple neural network model. Findings: Experiments conducted using self-collected and benchmark data sets show that the proposed approach outperforms 16 commonly used classifiers (including seven non–rule-based and four rule-based classifiers as well as five deep learning models) in terms of interpretability and identification performance. Originality/value: Investigating patterns of phishing websites based on hyperlink indicators using the efficient rule-based approach is innovative. It is not only helpful for identifying phishing websites, but also beneficial for extracting simple and understandable rules.",

keywords = "Classification, Hyperlink analysis, Neural networks, Phishing websites, Rule extraction",

author = "Chaoqun Wang and Zhongyi Hu and Raymond Chiong and Yukun Bao and Jiang Wu",

note = "Publisher Copyright: {\textcopyright} 2020, Emerald Publishing Limited.",

year = "2020",

month = dec,

day = "12",

doi = "10.1108/EL-01-2020-0016",

language = "English",

volume = "38",

pages = "1073--1093",

journal = "Electronic Library",

issn = "0264-0473",

number = "5-6",

}

TY - JOUR

T1 - Identification of phishing websites through hyperlink analysis and rule extraction

AU - Wang, Chaoqun

AU - Hu, Zhongyi

AU - Chiong, Raymond

AU - Bao, Yukun

AU - Wu, Jiang

PY - 2020/12/12

Y1 - 2020/12/12

N2 - Purpose: The aim of this study is to propose an efficient rule extraction and integration approach for identifying phishing websites. The proposed approach can elucidate patterns of phishing websites and identify them accurately. Design/methodology/approach: Hyperlink indicators along with URL-based features are used to build the identification model. In the proposed approach, very simple rules are first extracted based on individual features to provide meaningful and easy-to-understand rules. Then, the F-measure score is used to select high-quality rules for identifying phishing websites. To construct a reliable and promising phishing website identification model, the selected rules are integrated using a simple neural network model. Findings: Experiments conducted using self-collected and benchmark data sets show that the proposed approach outperforms 16 commonly used classifiers (including seven non–rule-based and four rule-based classifiers as well as five deep learning models) in terms of interpretability and identification performance. Originality/value: Investigating patterns of phishing websites based on hyperlink indicators using the efficient rule-based approach is innovative. It is not only helpful for identifying phishing websites, but also beneficial for extracting simple and understandable rules.

AB - Purpose: The aim of this study is to propose an efficient rule extraction and integration approach for identifying phishing websites. The proposed approach can elucidate patterns of phishing websites and identify them accurately. Design/methodology/approach: Hyperlink indicators along with URL-based features are used to build the identification model. In the proposed approach, very simple rules are first extracted based on individual features to provide meaningful and easy-to-understand rules. Then, the F-measure score is used to select high-quality rules for identifying phishing websites. To construct a reliable and promising phishing website identification model, the selected rules are integrated using a simple neural network model. Findings: Experiments conducted using self-collected and benchmark data sets show that the proposed approach outperforms 16 commonly used classifiers (including seven non–rule-based and four rule-based classifiers as well as five deep learning models) in terms of interpretability and identification performance. Originality/value: Investigating patterns of phishing websites based on hyperlink indicators using the efficient rule-based approach is innovative. It is not only helpful for identifying phishing websites, but also beneficial for extracting simple and understandable rules.

KW - Classification

KW - Hyperlink analysis

KW - Neural networks

KW - Phishing websites

KW - Rule extraction

UR - http://www.scopus.com/inward/record.url?scp=85096580389&partnerID=8YFLogxK

U2 - 10.1108/EL-01-2020-0016

DO - 10.1108/EL-01-2020-0016

M3 - Article

AN - SCOPUS:85096580389

SN - 0264-0473

VL - 38

SP - 1073

EP - 1093

JO - Electronic Library

JF - Electronic Library

IS - 5-6

ER -

Identification of phishing websites through hyperlink analysis and rule extraction

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this