ViTMed: Vision Transformer for Medical Image Analysis

Yu Jie Lim; Kian Ming Lim; Roy Kwang Yang Chang; Chin Poo Lee; Jit Yan Lim

doi:10.1109/ICoICT58202.2023.10262548

ViTMed: Vision Transformer for Medical Image Analysis

Yu Jie Lim^*, Kian Ming Lim, Roy Kwang Yang Chang, Chin Poo Lee, Jit Yan Lim

^*Corresponding author for this work

Multimedia University

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

3 Citations (Scopus)

Abstract

The COVID-19 global health crisis has presented daunting challenges to medical professionals, making accurate and efficient diagnoses more important than ever. In view of this, this paper proposes a Vision Transformer model, ViTMed, with an attention mechanism to classify the CT scan images for more effective diagnosis of COVID-19. Given the input CT scan images, it is represented as sequences of tokens and a transformer is utilized to capture global and local dependencies between features by utilizing self-attention mechanism. The core element in ViTMed is the transformer encoder with multi-headed attention (MHA) mechanism and feed-forward network. This enables model to learn hierarchical representation of image and make more informed predictions. The proposed ViTMed achieves promising performance with fewer parameters and computations than conventional Convolutional Neural Networks. From the experimental results, the proposed ViTMed outperforms state-of-the-art approaches for all three public benchmark datasets of COVID-19, 98.38%, 90.48%, and 99.17% accuracy for SARS-CoV-2-CT, COVID-CT, and iCTCF datasets, respectively. The number of samples collected for each dataset are 2482, 746, 19685. The datasets consist of two to three classes, which are Covid, Non-Covid and Non-informative cases.

Original language	English
Title of host publication	2023 11th International Conference on Information and Communication Technology, ICoICT 2023
Pages	277-282
Number of pages	6
ISBN (Electronic)	9798350321982
DOIs	https://doi.org/10.1109/ICoICT58202.2023.10262548
Publication status	Published - 2023
Externally published	Yes
Event	11th International Conference on Information and Communication Technology, ICoICT 2023 - Melaka, Malaysia Duration: 23 Aug 2023 → 24 Aug 2023

Publication series

Name	2023 11th International Conference on Information and Communication Technology, ICoICT 2023
Volume	2023-August

Conference

Conference	11th International Conference on Information and Communication Technology, ICoICT 2023
Country/Territory	Malaysia
City	Melaka
Period	23/08/23 → 24/08/23

Keywords

Attention
COVID-19
CT-Scan
Medical Image Analysis
Vision Transformer

Access to Document

10.1109/ICoICT58202.2023.10262548

Cite this

Lim, Y. J., Lim, K. M., Yang Chang, R. K., Poo Lee, C., & Lim, J. Y. (2023). ViTMed: Vision Transformer for Medical Image Analysis. In 2023 11th International Conference on Information and Communication Technology, ICoICT 2023 (pp. 277-282). (2023 11th International Conference on Information and Communication Technology, ICoICT 2023; Vol. 2023-August). https://doi.org/10.1109/ICoICT58202.2023.10262548

@inproceedings{8f2a600c6f6849e8b733ea0324fd2318,

title = "ViTMed: Vision Transformer for Medical Image Analysis",

abstract = "The COVID-19 global health crisis has presented daunting challenges to medical professionals, making accurate and efficient diagnoses more important than ever. In view of this, this paper proposes a Vision Transformer model, ViTMed, with an attention mechanism to classify the CT scan images for more effective diagnosis of COVID-19. Given the input CT scan images, it is represented as sequences of tokens and a transformer is utilized to capture global and local dependencies between features by utilizing self-attention mechanism. The core element in ViTMed is the transformer encoder with multi-headed attention (MHA) mechanism and feed-forward network. This enables model to learn hierarchical representation of image and make more informed predictions. The proposed ViTMed achieves promising performance with fewer parameters and computations than conventional Convolutional Neural Networks. From the experimental results, the proposed ViTMed outperforms state-of-the-art approaches for all three public benchmark datasets of COVID-19, 98.38%, 90.48%, and 99.17% accuracy for SARS-CoV-2-CT, COVID-CT, and iCTCF datasets, respectively. The number of samples collected for each dataset are 2482, 746, 19685. The datasets consist of two to three classes, which are Covid, Non-Covid and Non-informative cases.",

keywords = "Attention, COVID-19, CT-Scan, Medical Image Analysis, Vision Transformer",

author = "Lim, {Yu Jie} and Lim, {Kian Ming} and {Yang Chang}, {Roy Kwang} and {Poo Lee}, Chin and Lim, {Jit Yan}",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 11th International Conference on Information and Communication Technology, ICoICT 2023 ; Conference date: 23-08-2023 Through 24-08-2023",

year = "2023",

doi = "10.1109/ICoICT58202.2023.10262548",

language = "English",

series = "2023 11th International Conference on Information and Communication Technology, ICoICT 2023",

pages = "277--282",

booktitle = "2023 11th International Conference on Information and Communication Technology, ICoICT 2023",

}

Lim, YJ, Lim, KM, Yang Chang, RK, Poo Lee, C & Lim, JY 2023, ViTMed: Vision Transformer for Medical Image Analysis. in 2023 11th International Conference on Information and Communication Technology, ICoICT 2023. 2023 11th International Conference on Information and Communication Technology, ICoICT 2023, vol. 2023-August, pp. 277-282, 11th International Conference on Information and Communication Technology, ICoICT 2023, Melaka, Malaysia, 23/08/23. https://doi.org/10.1109/ICoICT58202.2023.10262548

ViTMed: Vision Transformer for Medical Image Analysis. / Lim, Yu Jie; Lim, Kian Ming; Yang Chang, Roy Kwang et al.
2023 11th International Conference on Information and Communication Technology, ICoICT 2023. 2023. p. 277-282 (2023 11th International Conference on Information and Communication Technology, ICoICT 2023; Vol. 2023-August).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - ViTMed: Vision Transformer for Medical Image Analysis

AU - Lim, Yu Jie

AU - Lim, Kian Ming

AU - Yang Chang, Roy Kwang

AU - Poo Lee, Chin

AU - Lim, Jit Yan

PY - 2023

Y1 - 2023

N2 - The COVID-19 global health crisis has presented daunting challenges to medical professionals, making accurate and efficient diagnoses more important than ever. In view of this, this paper proposes a Vision Transformer model, ViTMed, with an attention mechanism to classify the CT scan images for more effective diagnosis of COVID-19. Given the input CT scan images, it is represented as sequences of tokens and a transformer is utilized to capture global and local dependencies between features by utilizing self-attention mechanism. The core element in ViTMed is the transformer encoder with multi-headed attention (MHA) mechanism and feed-forward network. This enables model to learn hierarchical representation of image and make more informed predictions. The proposed ViTMed achieves promising performance with fewer parameters and computations than conventional Convolutional Neural Networks. From the experimental results, the proposed ViTMed outperforms state-of-the-art approaches for all three public benchmark datasets of COVID-19, 98.38%, 90.48%, and 99.17% accuracy for SARS-CoV-2-CT, COVID-CT, and iCTCF datasets, respectively. The number of samples collected for each dataset are 2482, 746, 19685. The datasets consist of two to three classes, which are Covid, Non-Covid and Non-informative cases.

AB - The COVID-19 global health crisis has presented daunting challenges to medical professionals, making accurate and efficient diagnoses more important than ever. In view of this, this paper proposes a Vision Transformer model, ViTMed, with an attention mechanism to classify the CT scan images for more effective diagnosis of COVID-19. Given the input CT scan images, it is represented as sequences of tokens and a transformer is utilized to capture global and local dependencies between features by utilizing self-attention mechanism. The core element in ViTMed is the transformer encoder with multi-headed attention (MHA) mechanism and feed-forward network. This enables model to learn hierarchical representation of image and make more informed predictions. The proposed ViTMed achieves promising performance with fewer parameters and computations than conventional Convolutional Neural Networks. From the experimental results, the proposed ViTMed outperforms state-of-the-art approaches for all three public benchmark datasets of COVID-19, 98.38%, 90.48%, and 99.17% accuracy for SARS-CoV-2-CT, COVID-CT, and iCTCF datasets, respectively. The number of samples collected for each dataset are 2482, 746, 19685. The datasets consist of two to three classes, which are Covid, Non-Covid and Non-informative cases.

KW - Attention

KW - COVID-19

KW - CT-Scan

KW - Medical Image Analysis

KW - Vision Transformer

UR - http://www.scopus.com/inward/record.url?scp=85174423662&partnerID=8YFLogxK

U2 - 10.1109/ICoICT58202.2023.10262548

DO - 10.1109/ICoICT58202.2023.10262548

M3 - Conference Proceeding

AN - SCOPUS:85174423662

T3 - 2023 11th International Conference on Information and Communication Technology, ICoICT 2023

SP - 277

EP - 282

BT - 2023 11th International Conference on Information and Communication Technology, ICoICT 2023

T2 - 11th International Conference on Information and Communication Technology, ICoICT 2023

Y2 - 23 August 2023 through 24 August 2023

ER -

ViTMed: Vision Transformer for Medical Image Analysis

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this