Skip to main navigation Skip to search Skip to main content

SkinCLIP-VL: Consistency-Aware Vision-Language Learning for Multimodal Skin Cancer Diagnosis: SkinCLIP-VL

  • XJTLU
  • XJTLU
  • Xi'an Jiaotong-Liverpool University
  • Mohamed Bin Zayed University of Artificial Intelligence
  • Shanghai Artificial Intelligence Laboratory
  • University of Liverpool

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

3 Downloads (Pure)

Abstract

The deployment of vision-language models (VLMs) in dermatology is hindered by the trilemma of high computational costs, extreme data scarcity, and the black-box nature of deep learning. To address these challenges, we present SkinCLIP-VL, a resource-efficient framework that adapts foundation models for trustworthy skin cancer diagnosis. Adopting a frozen perception, adaptive reasoning paradigm, we integrate a frozen CLIP encoder with a lightweight, quantized Qwen2.5-VL via low-rank adaptation (LoRA). To strictly align visual regions with clinical semantics under long-tailed distributions, we propose the Consistency-aware Focal Alignment (CFA) Loss. This objective synergizes focal re-weighting, semantic alignment, and calibration. On ISIC and Derm7pt benchmarks, SkinCLIP-VL surpasses 13B-parameter baselines by 4.3-6.2% in accuracy with 43% fewer parameters. Crucially, blinded expert evaluation and out-of-distribution testing confirm that our visually grounded rationales significantly enhance clinical trust compared to traditional saliency maps.
Original languageEnglish
Title of host publicationThe IEEE International Conference on Multimedia & Expo 2026
Subtitle of host publicationICME 2026
PublisherIEEE Press
Chapter1
Pages1-6
Number of pages6
Publication statusPublished - 5 Jul 2026
EventThe IEEE International Conference on Multimedia & Expo 2026: ICME 2026 - Bangkok, Thailand, Bangkok, Thailand
Duration: 5 Jul 20269 Jul 2026
https://2026.ieeeicme.org/

Conference

ConferenceThe IEEE International Conference on Multimedia & Expo 2026
Country/TerritoryThailand
CityBangkok
Period5/07/269/07/26
Internet address

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Cite this