TY - JOUR
T1 - Unraveled — A semi-synthetic dataset for Advanced Persistent Threats
AU - Myneni, Sowmya
AU - Jha, Kritshekhar
AU - Sabur, Abdulhakim
AU - Agrawal, Garima
AU - Deng, Yuli
AU - Chowdhary, Ankur
AU - Huang, Dijiang
N1 - Publisher Copyright:
© 2023
PY - 2023/5
Y1 - 2023/5
N2 - Unraveled is a novel cybersecurity dataset capturing Advanced Persistent Threat (APT) attacks not available in the public domain. Existing cybersecurity datasets lack coherent information about sophisticated and persistent cyber-attack features, including attack planning and deployment, stealthiness of the attacker(s), longer dorm period between attack activities, etc. Our APT attack scenario in Unraveled is implemented on a real network system established on a cloud platform to emulate an organization's network system. The new dataset provides a comprehensive network flow and host-level log information about the normal user(s) traffic and the cyber attacks traffic. To emulate realistic network traffic scenarios, Unraveled also includes attacks at different skills reflecting a typical organization's threat posture, and by utilizing APT attack information from one of the well-known APT attack databases, i.e., MITRE's APT-group database. Furthermore, we design and develop an Employee Behavior Generation (EBG) model to emulate multiple normal employees’ traffic and activities during a 6-week time period based on their pre-defined business functions. Using well-known machine learning models for anomaly detection, we show that the APT attack activities in Unraveled are hardly detected, indicating the need for more effective solutions that are based on datasets representing real world APT attacks.
AB - Unraveled is a novel cybersecurity dataset capturing Advanced Persistent Threat (APT) attacks not available in the public domain. Existing cybersecurity datasets lack coherent information about sophisticated and persistent cyber-attack features, including attack planning and deployment, stealthiness of the attacker(s), longer dorm period between attack activities, etc. Our APT attack scenario in Unraveled is implemented on a real network system established on a cloud platform to emulate an organization's network system. The new dataset provides a comprehensive network flow and host-level log information about the normal user(s) traffic and the cyber attacks traffic. To emulate realistic network traffic scenarios, Unraveled also includes attacks at different skills reflecting a typical organization's threat posture, and by utilizing APT attack information from one of the well-known APT attack databases, i.e., MITRE's APT-group database. Furthermore, we design and develop an Employee Behavior Generation (EBG) model to emulate multiple normal employees’ traffic and activities during a 6-week time period based on their pre-defined business functions. Using well-known machine learning models for anomaly detection, we show that the APT attack activities in Unraveled are hardly detected, indicating the need for more effective solutions that are based on datasets representing real world APT attacks.
KW - Advanced Persistent Threats
KW - Dataset
KW - Threat detection
UR - https://www.scopus.com/pages/publications/85150027764
U2 - 10.1016/j.comnet.2023.109688
DO - 10.1016/j.comnet.2023.109688
M3 - Article
AN - SCOPUS:85150027764
SN - 1389-1286
VL - 227
JO - Computer Networks
JF - Computer Networks
M1 - 109688
ER -