Generating Valid and Natural Adversarial Examples with Large Language Models

Zimu Wang; Wei Wang; Qi Chen; Qiufeng Wang; Anh Nguyen

doi:10.1109/CSCWD61410.2024.10580402

Generating Valid and Natural Adversarial Examples with Large Language Models

Zimu Wang, Wei Wang^*, Qi Chen, Qiufeng Wang, Anh Nguyen

^*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

6 Citations (Scopus)

Abstract

Deep learning-based natural language processing (NLP) models, particularly pre-trained language models (PLMs), have been revealed to be vulnerable to adversarial attacks. However, the adversarial examples generated by many mainstream word-level adversarial attack models are neither valid nor natural, leading to the loss of semantic maintenance, grammaticality, and human imperceptibility. Based on the exceptional capacity of language understanding and generation of large language models (LLMs), we propose LLM-Attack, which aims at generating both valid and natural adversarial examples with LLMs. The method consists of two stages: word importance ranking (which searches for the most vulnerable words) and word synonym replacement (which substitutes them with their synonyms obtained from LLMs). Experimental results on the Movie Review (MR), IMDB, and Yelp Review Polarity datasets against the baseline adversarial attack models illustrate the effectiveness of LLM-Attack, and it outperforms the baselines in human and GPT-4 evaluation by a significant margin. The model can generate adversarial examples that are typically valid and natural, with the preservation of semantic meaning, grammaticality, and human imperceptibility.

Original language	English
Title of host publication	Proceedings of the 2024 27th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2024
Editors	Weiming Shen, Weiming Shen, Jean-Paul Barthes, Junzhou Luo, Tie Qiu, Xiaobo Zhou, Jinghui Zhang, Haibin Zhu, Kunkun Peng, Tianyi Xu, Ning Chen
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	1716-1721
Number of pages	6
ISBN (Electronic)	9798350349184
DOIs	https://doi.org/10.1109/CSCWD61410.2024.10580402
Publication status	Published - 2024
Event	27th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2024 - Tianjin, China Duration: 8 May 2024 → 10 May 2024

Publication series

Name	Proceedings of the 2024 27th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2024

Conference

Conference	27th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2024
Country/Territory	China
City	Tianjin
Period	8/05/24 → 10/05/24

Keywords

Adversarial attack
adversarial examples
large language models
natural language processing
text classification

Access to Document

10.1109/CSCWD61410.2024.10580402

Cite this

Wang, Z., Wang, W., Chen, Q., Wang, Q., & Nguyen, A. (2024). Generating Valid and Natural Adversarial Examples with Large Language Models. In W. Shen, W. Shen, J.-P. Barthes, J. Luo, T. Qiu, X. Zhou, J. Zhang, H. Zhu, K. Peng, T. Xu, & N. Chen (Eds.), Proceedings of the 2024 27th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2024 (pp. 1716-1721). (Proceedings of the 2024 27th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2024). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CSCWD61410.2024.10580402

Wang, Zimu ; Wang, Wei ; Chen, Qi et al. / Generating Valid and Natural Adversarial Examples with Large Language Models. Proceedings of the 2024 27th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2024. editor / Weiming Shen ; Weiming Shen ; Jean-Paul Barthes ; Junzhou Luo ; Tie Qiu ; Xiaobo Zhou ; Jinghui Zhang ; Haibin Zhu ; Kunkun Peng ; Tianyi Xu ; Ning Chen. Institute of Electrical and Electronics Engineers Inc., 2024. pp. 1716-1721 (Proceedings of the 2024 27th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2024).

@inproceedings{0e12e928cfea43089e82720430fa13d7,

title = "Generating Valid and Natural Adversarial Examples with Large Language Models",

abstract = "Deep learning-based natural language processing (NLP) models, particularly pre-trained language models (PLMs), have been revealed to be vulnerable to adversarial attacks. However, the adversarial examples generated by many mainstream word-level adversarial attack models are neither valid nor natural, leading to the loss of semantic maintenance, grammaticality, and human imperceptibility. Based on the exceptional capacity of language understanding and generation of large language models (LLMs), we propose LLM-Attack, which aims at generating both valid and natural adversarial examples with LLMs. The method consists of two stages: word importance ranking (which searches for the most vulnerable words) and word synonym replacement (which substitutes them with their synonyms obtained from LLMs). Experimental results on the Movie Review (MR), IMDB, and Yelp Review Polarity datasets against the baseline adversarial attack models illustrate the effectiveness of LLM-Attack, and it outperforms the baselines in human and GPT-4 evaluation by a significant margin. The model can generate adversarial examples that are typically valid and natural, with the preservation of semantic meaning, grammaticality, and human imperceptibility.",

keywords = "Adversarial attack, adversarial examples, large language models, natural language processing, text classification",

author = "Zimu Wang and Wei Wang and Qi Chen and Qiufeng Wang and Anh Nguyen",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 27th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2024 ; Conference date: 08-05-2024 Through 10-05-2024",

year = "2024",

doi = "10.1109/CSCWD61410.2024.10580402",

language = "English",

series = "Proceedings of the 2024 27th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2024",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "1716--1721",

editor = "Weiming Shen and Weiming Shen and Jean-Paul Barthes and Junzhou Luo and Tie Qiu and Xiaobo Zhou and Jinghui Zhang and Haibin Zhu and Kunkun Peng and Tianyi Xu and Ning Chen",

booktitle = "Proceedings of the 2024 27th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2024",

}

Wang, Z, Wang, W , Chen, Q , Wang, Q & Nguyen, A 2024, Generating Valid and Natural Adversarial Examples with Large Language Models. in W Shen, W Shen, J-P Barthes, J Luo, T Qiu, X Zhou, J Zhang, H Zhu, K Peng, T Xu & N Chen (eds), Proceedings of the 2024 27th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2024. Proceedings of the 2024 27th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2024, Institute of Electrical and Electronics Engineers Inc., pp. 1716-1721, 27th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2024, Tianjin, China, 8/05/24. https://doi.org/10.1109/CSCWD61410.2024.10580402

Generating Valid and Natural Adversarial Examples with Large Language Models. / Wang, Zimu; Wang, Wei ; Chen, Qi et al.
Proceedings of the 2024 27th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2024. ed. / Weiming Shen; Weiming Shen; Jean-Paul Barthes; Junzhou Luo; Tie Qiu; Xiaobo Zhou; Jinghui Zhang; Haibin Zhu; Kunkun Peng; Tianyi Xu; Ning Chen. Institute of Electrical and Electronics Engineers Inc., 2024. p. 1716-1721 (Proceedings of the 2024 27th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2024).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Generating Valid and Natural Adversarial Examples with Large Language Models

AU - Wang, Zimu

AU - Wang, Wei

AU - Chen, Qi

AU - Wang, Qiufeng

AU - Nguyen, Anh

PY - 2024

Y1 - 2024

N2 - Deep learning-based natural language processing (NLP) models, particularly pre-trained language models (PLMs), have been revealed to be vulnerable to adversarial attacks. However, the adversarial examples generated by many mainstream word-level adversarial attack models are neither valid nor natural, leading to the loss of semantic maintenance, grammaticality, and human imperceptibility. Based on the exceptional capacity of language understanding and generation of large language models (LLMs), we propose LLM-Attack, which aims at generating both valid and natural adversarial examples with LLMs. The method consists of two stages: word importance ranking (which searches for the most vulnerable words) and word synonym replacement (which substitutes them with their synonyms obtained from LLMs). Experimental results on the Movie Review (MR), IMDB, and Yelp Review Polarity datasets against the baseline adversarial attack models illustrate the effectiveness of LLM-Attack, and it outperforms the baselines in human and GPT-4 evaluation by a significant margin. The model can generate adversarial examples that are typically valid and natural, with the preservation of semantic meaning, grammaticality, and human imperceptibility.

AB - Deep learning-based natural language processing (NLP) models, particularly pre-trained language models (PLMs), have been revealed to be vulnerable to adversarial attacks. However, the adversarial examples generated by many mainstream word-level adversarial attack models are neither valid nor natural, leading to the loss of semantic maintenance, grammaticality, and human imperceptibility. Based on the exceptional capacity of language understanding and generation of large language models (LLMs), we propose LLM-Attack, which aims at generating both valid and natural adversarial examples with LLMs. The method consists of two stages: word importance ranking (which searches for the most vulnerable words) and word synonym replacement (which substitutes them with their synonyms obtained from LLMs). Experimental results on the Movie Review (MR), IMDB, and Yelp Review Polarity datasets against the baseline adversarial attack models illustrate the effectiveness of LLM-Attack, and it outperforms the baselines in human and GPT-4 evaluation by a significant margin. The model can generate adversarial examples that are typically valid and natural, with the preservation of semantic meaning, grammaticality, and human imperceptibility.

KW - Adversarial attack

KW - adversarial examples

KW - large language models

KW - natural language processing

KW - text classification

UR - http://www.scopus.com/inward/record.url?scp=85199078756&partnerID=8YFLogxK

U2 - 10.1109/CSCWD61410.2024.10580402

DO - 10.1109/CSCWD61410.2024.10580402

M3 - Conference Proceeding

AN - SCOPUS:85199078756

T3 - Proceedings of the 2024 27th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2024

SP - 1716

EP - 1721

BT - Proceedings of the 2024 27th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2024

A2 - Shen, Weiming

A2 - Barthes, Jean-Paul

A2 - Luo, Junzhou

A2 - Qiu, Tie

A2 - Zhou, Xiaobo

A2 - Zhang, Jinghui

A2 - Zhu, Haibin

A2 - Peng, Kunkun

A2 - Xu, Tianyi

A2 - Chen, Ning

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 27th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2024

Y2 - 8 May 2024 through 10 May 2024

ER -

Wang Z, Wang W , Chen Q , Wang Q, Nguyen A. Generating Valid and Natural Adversarial Examples with Large Language Models. In Shen W, Shen W, Barthes JP, Luo J, Qiu T, Zhou X, Zhang J, Zhu H, Peng K, Xu T, Chen N, editors, Proceedings of the 2024 27th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2024. Institute of Electrical and Electronics Engineers Inc. 2024. p. 1716-1721. (Proceedings of the 2024 27th International Conference on Computer Supported Cooperative Work in Design, CSCWD 2024). doi: 10.1109/CSCWD61410.2024.10580402

Generating Valid and Natural Adversarial Examples with Large Language Models

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this