TY - GEN
T1 - Prompt Generation for Enhanced Camouflaged Object Detection in Low-Altitude Economy
AU - Chen, Xuehan
AU - Ren, Guangyu
AU - Hu, Bintao
AU - Zhang, Wenzhang
AU - Yao, Xi
AU - Chen, Xiaoguang
AU - Liu, Hengyan
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - To ensure the safety of both aircraft and operators in low-altitude economy (LAE) activities, precise environmental perception capabilities are essential for effective collision prevention. However, achieving accurate perception remains challenging, particularly when obstacles are camouflaged and visually blended into their surroundings, making detection difficult, even with the robust foundation model, the Segment Anything Model (SAM). Although SAM's prompt-based strategy improves its performance in the Camouflaged Object Detection (COD) task, its reliance on limited prompts introduces new challenges. Instead of manually annotating prompts, our work introduces a multimodal learning approach that utilizes a Vision-Language Model (VLM) to automatically generate mask prompts. By integrating visual and textual information, this work generates high-quality prompts that significantly enhance the performance of SAM in identifying camouflaged objects. Experimental results demonstrate that the proposed method achieves an average improvement of 13% over the baseline SAM across three COD benchmark datasets.
AB - To ensure the safety of both aircraft and operators in low-altitude economy (LAE) activities, precise environmental perception capabilities are essential for effective collision prevention. However, achieving accurate perception remains challenging, particularly when obstacles are camouflaged and visually blended into their surroundings, making detection difficult, even with the robust foundation model, the Segment Anything Model (SAM). Although SAM's prompt-based strategy improves its performance in the Camouflaged Object Detection (COD) task, its reliance on limited prompts introduces new challenges. Instead of manually annotating prompts, our work introduces a multimodal learning approach that utilizes a Vision-Language Model (VLM) to automatically generate mask prompts. By integrating visual and textual information, this work generates high-quality prompts that significantly enhance the performance of SAM in identifying camouflaged objects. Experimental results demonstrate that the proposed method achieves an average improvement of 13% over the baseline SAM across three COD benchmark datasets.
KW - Camouflaged Object Detection
KW - Low-altitude Economy
KW - Multimodal Learning
KW - Prompt Generation
KW - Segment Anything Model
UR - https://www.scopus.com/pages/publications/105019058129
U2 - 10.1109/VTC2025-Spring65109.2025.11174770
DO - 10.1109/VTC2025-Spring65109.2025.11174770
M3 - Conference Proceeding
AN - SCOPUS:105019058129
T3 - IEEE Vehicular Technology Conference
BT - 2025 IEEE 101st Vehicular Technology Conference, VTC 2025-Spring 2025 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 101st IEEE Vehicular Technology Conference, VTC 2025-Spring 2025
Y2 - 17 June 2025 through 20 June 2025
ER -