Behind the Bait: Delving into PhishTank's hidden data

Affan Yasin, Rubia Fatima, Javed Ali Khan, Wasif Afzal*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Phishing constitutes a form of social engineering that aims to deceive individuals through email communication. Extensive prior research has underscored phishing as one of the most commonly employed attack vectors for infiltrating organizational networks. A prevalent method involves misleading the target by employing phishing URLs concealed through hyperlink strategies. PhishTank, a website employing the concept of crowd-sourcing, aggregates phishing URLs and subsequently verifies their authenticity. In the course of this study, we leveraged a Python script to extract data from the PhishTank website, amassing a comprehensive dataset comprising over 190,0000 phishing URLs. This dataset is a valuable resource that can be harnessed by both researchers and practitioners for enhancing phish- ing filters, fortifying firewalls, security education, and refining training and testing models, among other applications.

Original languageEnglish
Article number109959
JournalData in Brief
Volume52
DOIs
Publication statusPublished - Feb 2024
Externally publishedYes

Keywords

  • Artificial intelligence
  • Computer security
  • Dataset
  • Email security
  • Phished URL
  • Social engineering
  • Web security

Fingerprint

Dive into the research topics of 'Behind the Bait: Delving into PhishTank's hidden data'. Together they form a unique fingerprint.

Cite this