Imperceptible Transfer Attack on Large Vision-Language Models

Xiaowen Cai; Daizong Liu; Runwei Guan; Pan Zhou

doi:10.1109/ICASSP49660.2025.10890065

Imperceptible Transfer Attack on Large Vision-Language Models

Xiaowen Cai, Daizong Liu, Runwei Guan, Pan Zhou^*

^*Corresponding author for this work

Xi'an Jiaotong-Liverpool University

Research output: Contribution to journal › Conference article › peer-review

Abstract

In spite of achieving significant progress in recent years, Large Vision-Language Models (LVLMs) are proven to be vulnerable to adversarial examples. Therefore, there is an urgent need for an effective adversarial attack to identify the deficiencies of LVLMs in security-sensitive applications. However, existing LVLM attackers generally optimize adversarial samples against a specific textual prompt with a certain LVLM model, tending to overfit the target prompt/network and hardly remain malicious once they are transferred to attack a different prompt/model. To this end, in this paper, we propose a novel Imperceptible Transfer Attack (ITA) against LVLMs to generate prompt/model-agnostic adversarial samples to enhance such adversarial transferability while further improving the imperceptibility. Specifically, we learn to apply appropriate visual transformations on image inputs to create diverse input patterns by selecting the optimal combination of operations from a pool of candidates, consequently improving adversarial transferability. We conceptualize the selection of optimal transformation combinations as an adversarial learning problem and employ a gradient approximation strategy with noise budget constraints to effectively generate imperceptible transferable samples. Extensive experiments on three LVLM models and two widely used datasets with three tasks demonstrate the superior performance of our ITA.

Original language	English
Journal	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
DOIs	https://doi.org/10.1109/ICASSP49660.2025.10890065
Publication status	Published - 2025
Event	2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Hyderabad, India Duration: 6 Apr 2025 → 11 Apr 2025

Keywords

Imperceptible transfer attack
LVLMs

Access to Document

10.1109/ICASSP49660.2025.10890065

Cite this

@article{c4f957ab394f4233a490a057e4d7e163,

title = "Imperceptible Transfer Attack on Large Vision-Language Models",

abstract = "In spite of achieving significant progress in recent years, Large Vision-Language Models (LVLMs) are proven to be vulnerable to adversarial examples. Therefore, there is an urgent need for an effective adversarial attack to identify the deficiencies of LVLMs in security-sensitive applications. However, existing LVLM attackers generally optimize adversarial samples against a specific textual prompt with a certain LVLM model, tending to overfit the target prompt/network and hardly remain malicious once they are transferred to attack a different prompt/model. To this end, in this paper, we propose a novel Imperceptible Transfer Attack (ITA) against LVLMs to generate prompt/model-agnostic adversarial samples to enhance such adversarial transferability while further improving the imperceptibility. Specifically, we learn to apply appropriate visual transformations on image inputs to create diverse input patterns by selecting the optimal combination of operations from a pool of candidates, consequently improving adversarial transferability. We conceptualize the selection of optimal transformation combinations as an adversarial learning problem and employ a gradient approximation strategy with noise budget constraints to effectively generate imperceptible transferable samples. Extensive experiments on three LVLM models and two widely used datasets with three tasks demonstrate the superior performance of our ITA.",

keywords = "Imperceptible transfer attack, LVLMs",

author = "Xiaowen Cai and Daizong Liu and Runwei Guan and Pan Zhou",

note = "Publisher Copyright: {\textcopyright} 2025 IEEE.; 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 ; Conference date: 06-04-2025 Through 11-04-2025",

year = "2025",

doi = "10.1109/ICASSP49660.2025.10890065",

language = "English",

journal = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

issn = "1520-6149",

}

TY - JOUR

T1 - Imperceptible Transfer Attack on Large Vision-Language Models

AU - Cai, Xiaowen

AU - Liu, Daizong

AU - Guan, Runwei

AU - Zhou, Pan

PY - 2025

Y1 - 2025

N2 - In spite of achieving significant progress in recent years, Large Vision-Language Models (LVLMs) are proven to be vulnerable to adversarial examples. Therefore, there is an urgent need for an effective adversarial attack to identify the deficiencies of LVLMs in security-sensitive applications. However, existing LVLM attackers generally optimize adversarial samples against a specific textual prompt with a certain LVLM model, tending to overfit the target prompt/network and hardly remain malicious once they are transferred to attack a different prompt/model. To this end, in this paper, we propose a novel Imperceptible Transfer Attack (ITA) against LVLMs to generate prompt/model-agnostic adversarial samples to enhance such adversarial transferability while further improving the imperceptibility. Specifically, we learn to apply appropriate visual transformations on image inputs to create diverse input patterns by selecting the optimal combination of operations from a pool of candidates, consequently improving adversarial transferability. We conceptualize the selection of optimal transformation combinations as an adversarial learning problem and employ a gradient approximation strategy with noise budget constraints to effectively generate imperceptible transferable samples. Extensive experiments on three LVLM models and two widely used datasets with three tasks demonstrate the superior performance of our ITA.

AB - In spite of achieving significant progress in recent years, Large Vision-Language Models (LVLMs) are proven to be vulnerable to adversarial examples. Therefore, there is an urgent need for an effective adversarial attack to identify the deficiencies of LVLMs in security-sensitive applications. However, existing LVLM attackers generally optimize adversarial samples against a specific textual prompt with a certain LVLM model, tending to overfit the target prompt/network and hardly remain malicious once they are transferred to attack a different prompt/model. To this end, in this paper, we propose a novel Imperceptible Transfer Attack (ITA) against LVLMs to generate prompt/model-agnostic adversarial samples to enhance such adversarial transferability while further improving the imperceptibility. Specifically, we learn to apply appropriate visual transformations on image inputs to create diverse input patterns by selecting the optimal combination of operations from a pool of candidates, consequently improving adversarial transferability. We conceptualize the selection of optimal transformation combinations as an adversarial learning problem and employ a gradient approximation strategy with noise budget constraints to effectively generate imperceptible transferable samples. Extensive experiments on three LVLM models and two widely used datasets with three tasks demonstrate the superior performance of our ITA.

KW - Imperceptible transfer attack

KW - LVLMs

UR - http://www.scopus.com/inward/record.url?scp=105009857361&partnerID=8YFLogxK

U2 - 10.1109/ICASSP49660.2025.10890065

DO - 10.1109/ICASSP49660.2025.10890065

M3 - Conference article

AN - SCOPUS:105009857361

SN - 1520-6149

JO - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

JF - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

T2 - 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025

Y2 - 6 April 2025 through 11 April 2025

ER -

Imperceptible Transfer Attack on Large Vision-Language Models

Abstract

Keywords

Access to Document

Other files and links

Cite this