Multimodal contrastive learning for radiology report generation

Xing Wu*, Jingwen Li, Jianjia Wang, Quan Qian

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)


Automated radiology report generation can not only lighten the workload of clinicians but also improve the efficiency of disease diagnosis. However, it is a challenging task to generate semantically coherent radiology reports that are also highly consistent with medical images. To meet the challenge, we propose a Multimodal Recursive model with Contrastive Learning (MRCL). The proposed MRCL method incorporates both visual and semantic features to generate “Impression” and “Findings” of radiology reports through a recursive network, in which a contrastive pre-training method is proposed to improve the expressiveness of both visual and textual representations. Extensive experiments and analyses prove the efficacy of the proposed MRCL, which can not only generate semantically coherent radiology reports but also outperform state-of-the-art methods.

Original languageEnglish
Pages (from-to)11185-11194
Number of pages10
JournalJournal of Ambient Intelligence and Humanized Computing
Issue number8
Publication statusPublished - Aug 2023
Externally publishedYes


  • Contrastive learning
  • Multimodal recursive model
  • Radiology report generation
  • Semantic representation
  • Visual representation


Dive into the research topics of 'Multimodal contrastive learning for radiology report generation'. Together they form a unique fingerprint.

Cite this