TY - GEN
T1 - Reconstructing Classification to Enhance Machine-Learning Based Network Intrusion Detection by Embracing Ambiguity
AU - Song, Chungsik
AU - Fan, Wenjun
AU - Chang, Sang Yoon
AU - Park, Younghee
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - Network intrusion detection systems (IDS) has efficiently identified the profiles of normal network activities, extracted intrusion patterns, and constructed generalized models to evaluate (un)known attacks using a wide range of machine learning approaches. In spite of the effectiveness of machine learning-based IDS, it has been still challenging to reduce high false alarms due to data misclassification. In this paper, by using multiple decision mechanisms, we propose a new classification method to identify misclassified data and then to classify them into three different classes, called a malicious, benign, and ambiguous dataset. In other words, the ambiguous dataset contains a majority of the misclassified dataset and is thus the most informative for improving the model and anomaly detection because of the lack of confidence for the data classification in the model. We evaluate our approach with the recent real-world network traffic data, Kyoto2006+ datasets, and show that the ambiguous dataset contains 77.2% of the previously misclassified data. Re-evaluating the ambiguous dataset effectively reduces the false prediction rate with minimal overhead and improves accuracy by 15%.
AB - Network intrusion detection systems (IDS) has efficiently identified the profiles of normal network activities, extracted intrusion patterns, and constructed generalized models to evaluate (un)known attacks using a wide range of machine learning approaches. In spite of the effectiveness of machine learning-based IDS, it has been still challenging to reduce high false alarms due to data misclassification. In this paper, by using multiple decision mechanisms, we propose a new classification method to identify misclassified data and then to classify them into three different classes, called a malicious, benign, and ambiguous dataset. In other words, the ambiguous dataset contains a majority of the misclassified dataset and is thus the most informative for improving the model and anomaly detection because of the lack of confidence for the data classification in the model. We evaluate our approach with the recent real-world network traffic data, Kyoto2006+ datasets, and show that the ambiguous dataset contains 77.2% of the previously misclassified data. Re-evaluating the ambiguous dataset effectively reduces the false prediction rate with minimal overhead and improves accuracy by 15%.
KW - Ensemble classifiers
KW - Machine learning
KW - Network intrusion detection
UR - http://www.scopus.com/inward/record.url?scp=85107446076&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-72725-3_13
DO - 10.1007/978-3-030-72725-3_13
M3 - Conference Proceeding
SN - 9783030727246
T3 - Communications in Computer and Information Science
SP - 169
EP - 187
BT - Silicon Valley Cybersecurity Conference - First Conference, SVCC 2020, Revised Selected Papers
A2 - Park, Younghee
A2 - Jadav, Divyesh
A2 - Austin, Thomas
PB - Springer Science and Business Media Deutschland GmbH
T2 - 1st Silicon Valley Cybersecurity Conference, SVCC 2020
Y2 - 17 December 2020 through 19 December 2020
ER -