Skip to main navigation Skip to search Skip to main content

Adversarial Attacks and Defenses on Large Language Models: A Systematic Review

  • Ruobing Li
  • , Jiayi Liu
  • , Jinyang Zhen
  • , Yuyuan Bao
  • , Yiqing Qiang
  • , Yue Xu
  • , Yuxi Gu
  • , Yuxin Jin
  • , Daoming Yu
  • , Yunchao He
  • , Mian Zhou
  • , Angelos Stefanidis
  • , Jionglong Su*
  • *Corresponding author for this work
  • Xi'an Jiaotong-Liverpool University

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

Abstract

The rapid advancement of large language models (LLMs) presents both transformative opportunities and significant security challenges, primarily due to their susceptibility to adversarial attacks-crafted inputs designed to deceive them-which pose considerable risks to their reliability and security. This survey comprehensively reviews these textual adversarial attacks, categorizing them into sentence-level, wordlevel, and character-level manipulations, while also exploring multi-level approaches that combine these techniques. To counter such threats, the survey examines a range of defense strategies, including detection methods, adversarial training, semantic analysis, general system enhancements, and approaches to certified robustness for provable resilience. Despite notable progress in understanding and mitigating these vulnerabilities, significant challenges remain regarding real-world applicability, scalability, generalization across different models and attack types, and the establishment of standardized benchmarking. By synthesizing recent research in this dynamic field, this survey aims to guide the development of more secure and robust language models capable of withstanding the continuously evolving landscape of adversarial threats.

Original languageEnglish
Title of host publication2025 8th International Conference on Big Data and Artificial Intelligence, BDAI 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages157-164
Number of pages8
ISBN (Electronic)9798350392524
DOIs
Publication statusPublished - 2025
Event8th International Conference on Big Data and Artificial Intelligence, BDAI 2025 - Taicang, China
Duration: 22 Aug 202524 Aug 2025

Publication series

Name2025 8th International Conference on Big Data and Artificial Intelligence, BDAI 2025

Conference

Conference8th International Conference on Big Data and Artificial Intelligence, BDAI 2025
Country/TerritoryChina
CityTaicang
Period22/08/2524/08/25

Keywords

  • Certified Robustness
  • Char-level Attack
  • Multi-level Attack
  • Sentence-level Attack
  • Textual Adversarial Attack
  • Texture Model Defence
  • Word-level Attack

Cite this