SHAPE: A Simultaneous Header and Payload Encoding Model for Encrypted Traffic Classification

Jianbang Dai; Xiaolong Xu; Honghao Gao; Xinheng Wang; Fu Xiao

doi:10.1109/TNSM.2022.3213758

SHAPE: A Simultaneous Header and Payload Encoding Model for Encrypted Traffic Classification

Jianbang Dai, Xiaolong Xu^*, Honghao Gao, Xinheng Wang, Fu Xiao

^*Corresponding author for this work

Department of Mechatronics and Robotics

Research output: Contribution to journal › Article › peer-review

7 Citations (Scopus)

Abstract

Many end-to-end deep learning algorithms seeking to classify malicious traffic and encrypted traffic have been proposed in recent years. End-to-end deep learning algorithms require a large number of samples to train a model. However, it is hard for existing methods fully utilizing the heterogeneous multimodal input. To this end, we propose the SHAPE model (simultaneous header and payload encoding), which mainly consists of two autoencoders and a transformer layer, to improve model performance. The two auto encoders extract features from heterogeneous inputs - the statistical information of each packet and byte-form payloads - and convert them into a unified format; then, a lightweight Transformers layer further extracts the relationship hidden in simultaneous input. In particular, the autoencoder for payload feature extraction contains several depthwise separable residual convolution layers for efficient feature extraction and a token squeeze layer to reduce the computing overhead of the Transformers layer. Moreover, we train the SHAPE model using deep metric learning, which pulls samples with the same class label together and separates samples from different classes in the low-dimensional embedding space. Thus, the SHAPE model can naturally handle multitask classification, and its performance is approximately 5.43% better than the current SOTA on the traffic type classification of the ISCX-VPN2016 dataset, at the cost of 9.31 times the training time, and 1.45 times the inference time.

Original language	English
Pages (from-to)	1993-2012
Number of pages	20
Journal	IEEE Transactions on Network and Service Management
Volume	20
Issue number	2
DOIs	https://doi.org/10.1109/TNSM.2022.3213758
Publication status	Published - 1 Jun 2023

Keywords

Traffic classification
autoencoder
deep metric learning
encrypted traffic
transformer

Access to Document

10.1109/TNSM.2022.3213758

Cite this

@article{3dbdb809c5974ddfb5f0e5536b76d583,

title = "SHAPE: A Simultaneous Header and Payload Encoding Model for Encrypted Traffic Classification",

abstract = "Many end-to-end deep learning algorithms seeking to classify malicious traffic and encrypted traffic have been proposed in recent years. End-to-end deep learning algorithms require a large number of samples to train a model. However, it is hard for existing methods fully utilizing the heterogeneous multimodal input. To this end, we propose the SHAPE model (simultaneous header and payload encoding), which mainly consists of two autoencoders and a transformer layer, to improve model performance. The two auto encoders extract features from heterogeneous inputs - the statistical information of each packet and byte-form payloads - and convert them into a unified format; then, a lightweight Transformers layer further extracts the relationship hidden in simultaneous input. In particular, the autoencoder for payload feature extraction contains several depthwise separable residual convolution layers for efficient feature extraction and a token squeeze layer to reduce the computing overhead of the Transformers layer. Moreover, we train the SHAPE model using deep metric learning, which pulls samples with the same class label together and separates samples from different classes in the low-dimensional embedding space. Thus, the SHAPE model can naturally handle multitask classification, and its performance is approximately 5.43% better than the current SOTA on the traffic type classification of the ISCX-VPN2016 dataset, at the cost of 9.31 times the training time, and 1.45 times the inference time.",

keywords = "Traffic classification, autoencoder, deep metric learning, encrypted traffic, transformer",

author = "Jianbang Dai and Xiaolong Xu and Honghao Gao and Xinheng Wang and Fu Xiao",

note = "Publisher Copyright: {\textcopyright} 2004-2012 IEEE.",

year = "2023",

month = jun,

day = "1",

doi = "10.1109/TNSM.2022.3213758",

language = "English",

volume = "20",

pages = "1993--2012",

journal = "IEEE Transactions on Network and Service Management",

issn = "1932-4537",

number = "2",

}

TY - JOUR

T1 - SHAPE

T2 - A Simultaneous Header and Payload Encoding Model for Encrypted Traffic Classification

AU - Dai, Jianbang

AU - Xu, Xiaolong

AU - Gao, Honghao

AU - Wang, Xinheng

AU - Xiao, Fu

PY - 2023/6/1

Y1 - 2023/6/1

N2 - Many end-to-end deep learning algorithms seeking to classify malicious traffic and encrypted traffic have been proposed in recent years. End-to-end deep learning algorithms require a large number of samples to train a model. However, it is hard for existing methods fully utilizing the heterogeneous multimodal input. To this end, we propose the SHAPE model (simultaneous header and payload encoding), which mainly consists of two autoencoders and a transformer layer, to improve model performance. The two auto encoders extract features from heterogeneous inputs - the statistical information of each packet and byte-form payloads - and convert them into a unified format; then, a lightweight Transformers layer further extracts the relationship hidden in simultaneous input. In particular, the autoencoder for payload feature extraction contains several depthwise separable residual convolution layers for efficient feature extraction and a token squeeze layer to reduce the computing overhead of the Transformers layer. Moreover, we train the SHAPE model using deep metric learning, which pulls samples with the same class label together and separates samples from different classes in the low-dimensional embedding space. Thus, the SHAPE model can naturally handle multitask classification, and its performance is approximately 5.43% better than the current SOTA on the traffic type classification of the ISCX-VPN2016 dataset, at the cost of 9.31 times the training time, and 1.45 times the inference time.

AB - Many end-to-end deep learning algorithms seeking to classify malicious traffic and encrypted traffic have been proposed in recent years. End-to-end deep learning algorithms require a large number of samples to train a model. However, it is hard for existing methods fully utilizing the heterogeneous multimodal input. To this end, we propose the SHAPE model (simultaneous header and payload encoding), which mainly consists of two autoencoders and a transformer layer, to improve model performance. The two auto encoders extract features from heterogeneous inputs - the statistical information of each packet and byte-form payloads - and convert them into a unified format; then, a lightweight Transformers layer further extracts the relationship hidden in simultaneous input. In particular, the autoencoder for payload feature extraction contains several depthwise separable residual convolution layers for efficient feature extraction and a token squeeze layer to reduce the computing overhead of the Transformers layer. Moreover, we train the SHAPE model using deep metric learning, which pulls samples with the same class label together and separates samples from different classes in the low-dimensional embedding space. Thus, the SHAPE model can naturally handle multitask classification, and its performance is approximately 5.43% better than the current SOTA on the traffic type classification of the ISCX-VPN2016 dataset, at the cost of 9.31 times the training time, and 1.45 times the inference time.

KW - Traffic classification

KW - autoencoder

KW - deep metric learning

KW - encrypted traffic

KW - transformer

UR - http://www.scopus.com/inward/record.url?scp=85139868283&partnerID=8YFLogxK

U2 - 10.1109/TNSM.2022.3213758

DO - 10.1109/TNSM.2022.3213758

M3 - Article

AN - SCOPUS:85139868283

SN - 1932-4537

VL - 20

SP - 1993

EP - 2012

JO - IEEE Transactions on Network and Service Management

JF - IEEE Transactions on Network and Service Management

IS - 2

ER -

SHAPE: A Simultaneous Header and Payload Encoding Model for Encrypted Traffic Classification

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this