Abstract
LLMs increasingly integrate auto-suggestion optimization modules, enabling them to rewrite and display user input before generating the final response. While this design aims to enhance transparency and trust, its process of autonomously selecting a single ``best" result from multiple candidate solutions allows attackers to hijack this optimization process by inducing subtle, imperceptible semantic shifts. To address this, we propose a semantic preservation hijacking attack method based on black-box conditions—Adaptive Greedy Local Search. This method hierarchically decomposes the input text, masks key language units, and dynamically adjusts candidate replacement words at predefined semantic checkpoints. This maximizes the deviation between the model output and the original intent while strictly maintaining semantic similarity to the original text. Experimental results on commercial and open-source LLM demonstrate that, under the same semantic similarity constraints, this method achieves a higher attack success rate than existing attack methods in over 2400 test cases.
| Original language | English |
|---|---|
| Title of host publication | The IEEE International Conference on Multimedia & Expo 2026 |
| Publisher | IEEE Press |
| Chapter | 1 |
| Pages | 1-12 |
| Number of pages | 12 |
| Publication status | Published - 5 Jul 2026 |
| Event | The IEEE International Conference on Multimedia & Expo 2026: ICME 2026 - Bangkok, Thailand, Bangkok, Thailand Duration: 5 Jul 2026 → 9 Jul 2026 https://2026.ieeeicme.org/ |
Conference
| Conference | The IEEE International Conference on Multimedia & Expo 2026 |
|---|---|
| Country/Territory | Thailand |
| City | Bangkok |
| Period | 5/07/26 → 9/07/26 |
| Internet address |
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver