Multimodal contrastive learning for radiology report generation

Xing Wu; Jingwen Li; Jianjia Wang; Quan Qian

doi:10.1007/s12652-022-04398-4

Multimodal contrastive learning for radiology report generation

Xing Wu^*, Jingwen Li, Jianjia Wang, Quan Qian

^*Corresponding author for this work

Shanghai University

Research output: Contribution to journal › Article › peer-review

13 Citations (Scopus)

Abstract

Automated radiology report generation can not only lighten the workload of clinicians but also improve the efficiency of disease diagnosis. However, it is a challenging task to generate semantically coherent radiology reports that are also highly consistent with medical images. To meet the challenge, we propose a Multimodal Recursive model with Contrastive Learning (MRCL). The proposed MRCL method incorporates both visual and semantic features to generate “Impression” and “Findings” of radiology reports through a recursive network, in which a contrastive pre-training method is proposed to improve the expressiveness of both visual and textual representations. Extensive experiments and analyses prove the efficacy of the proposed MRCL, which can not only generate semantically coherent radiology reports but also outperform state-of-the-art methods.

Original language	English
Pages (from-to)	11185-11194
Number of pages	10
Journal	Journal of Ambient Intelligence and Humanized Computing
Volume	14
Issue number	8
DOIs	https://doi.org/10.1007/s12652-022-04398-4
Publication status	Published - Aug 2023
Externally published	Yes

Keywords

Contrastive learning
Multimodal recursive model
Radiology report generation
Semantic representation
Visual representation

Access to Document

10.1007/s12652-022-04398-4

Cite this

@article{48f4e54c79324876a64af074b37362b9,

title = "Multimodal contrastive learning for radiology report generation",

abstract = "Automated radiology report generation can not only lighten the workload of clinicians but also improve the efficiency of disease diagnosis. However, it is a challenging task to generate semantically coherent radiology reports that are also highly consistent with medical images. To meet the challenge, we propose a Multimodal Recursive model with Contrastive Learning (MRCL). The proposed MRCL method incorporates both visual and semantic features to generate “Impression” and “Findings” of radiology reports through a recursive network, in which a contrastive pre-training method is proposed to improve the expressiveness of both visual and textual representations. Extensive experiments and analyses prove the efficacy of the proposed MRCL, which can not only generate semantically coherent radiology reports but also outperform state-of-the-art methods.",

keywords = "Contrastive learning, Multimodal recursive model, Radiology report generation, Semantic representation, Visual representation",

author = "Xing Wu and Jingwen Li and Jianjia Wang and Quan Qian",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.",

year = "2023",

month = aug,

doi = "10.1007/s12652-022-04398-4",

language = "English",

volume = "14",

pages = "11185--11194",

journal = "Journal of Ambient Intelligence and Humanized Computing",

issn = "1868-5137",

number = "8",

}

TY - JOUR

T1 - Multimodal contrastive learning for radiology report generation

AU - Wu, Xing

AU - Li, Jingwen

AU - Wang, Jianjia

AU - Qian, Quan

PY - 2023/8

Y1 - 2023/8

N2 - Automated radiology report generation can not only lighten the workload of clinicians but also improve the efficiency of disease diagnosis. However, it is a challenging task to generate semantically coherent radiology reports that are also highly consistent with medical images. To meet the challenge, we propose a Multimodal Recursive model with Contrastive Learning (MRCL). The proposed MRCL method incorporates both visual and semantic features to generate “Impression” and “Findings” of radiology reports through a recursive network, in which a contrastive pre-training method is proposed to improve the expressiveness of both visual and textual representations. Extensive experiments and analyses prove the efficacy of the proposed MRCL, which can not only generate semantically coherent radiology reports but also outperform state-of-the-art methods.

AB - Automated radiology report generation can not only lighten the workload of clinicians but also improve the efficiency of disease diagnosis. However, it is a challenging task to generate semantically coherent radiology reports that are also highly consistent with medical images. To meet the challenge, we propose a Multimodal Recursive model with Contrastive Learning (MRCL). The proposed MRCL method incorporates both visual and semantic features to generate “Impression” and “Findings” of radiology reports through a recursive network, in which a contrastive pre-training method is proposed to improve the expressiveness of both visual and textual representations. Extensive experiments and analyses prove the efficacy of the proposed MRCL, which can not only generate semantically coherent radiology reports but also outperform state-of-the-art methods.

KW - Contrastive learning

KW - Multimodal recursive model

KW - Radiology report generation

KW - Semantic representation

KW - Visual representation

UR - http://www.scopus.com/inward/record.url?scp=85137807778&partnerID=8YFLogxK

U2 - 10.1007/s12652-022-04398-4

DO - 10.1007/s12652-022-04398-4

M3 - Article

AN - SCOPUS:85137807778

SN - 1868-5137

VL - 14

SP - 11185

EP - 11194

JO - Journal of Ambient Intelligence and Humanized Computing

JF - Journal of Ambient Intelligence and Humanized Computing

IS - 8

ER -

Multimodal contrastive learning for radiology report generation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this