TY - GEN
T1 - Adversarial Attacks and Defenses on Large Language Models
T2 - 8th International Conference on Big Data and Artificial Intelligence, BDAI 2025
AU - Li, Ruobing
AU - Liu, Jiayi
AU - Zhen, Jinyang
AU - Bao, Yuyuan
AU - Qiang, Yiqing
AU - Xu, Yue
AU - Gu, Yuxi
AU - Jin, Yuxin
AU - Yu, Daoming
AU - He, Yunchao
AU - Zhou, Mian
AU - Stefanidis, Angelos
AU - Su, Jionglong
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - The rapid advancement of large language models (LLMs) presents both transformative opportunities and significant security challenges, primarily due to their susceptibility to adversarial attacks-crafted inputs designed to deceive them-which pose considerable risks to their reliability and security. This survey comprehensively reviews these textual adversarial attacks, categorizing them into sentence-level, wordlevel, and character-level manipulations, while also exploring multi-level approaches that combine these techniques. To counter such threats, the survey examines a range of defense strategies, including detection methods, adversarial training, semantic analysis, general system enhancements, and approaches to certified robustness for provable resilience. Despite notable progress in understanding and mitigating these vulnerabilities, significant challenges remain regarding real-world applicability, scalability, generalization across different models and attack types, and the establishment of standardized benchmarking. By synthesizing recent research in this dynamic field, this survey aims to guide the development of more secure and robust language models capable of withstanding the continuously evolving landscape of adversarial threats.
AB - The rapid advancement of large language models (LLMs) presents both transformative opportunities and significant security challenges, primarily due to their susceptibility to adversarial attacks-crafted inputs designed to deceive them-which pose considerable risks to their reliability and security. This survey comprehensively reviews these textual adversarial attacks, categorizing them into sentence-level, wordlevel, and character-level manipulations, while also exploring multi-level approaches that combine these techniques. To counter such threats, the survey examines a range of defense strategies, including detection methods, adversarial training, semantic analysis, general system enhancements, and approaches to certified robustness for provable resilience. Despite notable progress in understanding and mitigating these vulnerabilities, significant challenges remain regarding real-world applicability, scalability, generalization across different models and attack types, and the establishment of standardized benchmarking. By synthesizing recent research in this dynamic field, this survey aims to guide the development of more secure and robust language models capable of withstanding the continuously evolving landscape of adversarial threats.
KW - Certified Robustness
KW - Char-level Attack
KW - Multi-level Attack
KW - Sentence-level Attack
KW - Textual Adversarial Attack
KW - Texture Model Defence
KW - Word-level Attack
UR - https://www.scopus.com/pages/publications/105033339600
U2 - 10.1109/BDAI66031.2025.11325628
DO - 10.1109/BDAI66031.2025.11325628
M3 - Conference Proceeding
AN - SCOPUS:105033339600
T3 - 2025 8th International Conference on Big Data and Artificial Intelligence, BDAI 2025
SP - 157
EP - 164
BT - 2025 8th International Conference on Big Data and Artificial Intelligence, BDAI 2025
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 22 August 2025 through 24 August 2025
ER -