Skip to main navigation Skip to search Skip to main content

Semantic-Preserving Prompt Hijacking: A Black-Box Adversarial Attack on Auto-Prompt Optimization

  • The Chinese University of Hong Kong (CUHK)

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

29 Downloads (Pure)

Abstract

LLMs increasingly integrate auto-suggestion optimization modules, enabling them to rewrite and display user input before generating the final response. While this design aims to enhance transparency and trust, its process of autonomously selecting a single ``best" result from multiple candidate solutions allows attackers to hijack this optimization process by inducing subtle, imperceptible semantic shifts. To address this, we propose a semantic preservation hijacking attack method based on black-box conditions—Adaptive Greedy Local Search. This method hierarchically decomposes the input text, masks key language units, and dynamically adjusts candidate replacement words at predefined semantic checkpoints. This maximizes the deviation between the model output and the original intent while strictly maintaining semantic similarity to the original text. Experimental results on commercial and open-source LLM demonstrate that, under the same semantic similarity constraints, this method achieves a higher attack success rate than existing attack methods in over 2400 test cases.
Original languageEnglish
Title of host publicationThe IEEE International Conference on Multimedia & Expo 2026
PublisherIEEE Press
Chapter1
Pages1-12
Number of pages12
Publication statusPublished - 5 Jul 2026
EventThe IEEE International Conference on Multimedia & Expo 2026: ICME 2026 - Bangkok, Thailand, Bangkok, Thailand
Duration: 5 Jul 20269 Jul 2026
https://2026.ieeeicme.org/

Conference

ConferenceThe IEEE International Conference on Multimedia & Expo 2026
Country/TerritoryThailand
CityBangkok
Period5/07/269/07/26
Internet address

Cite this