KDViT: COVID-19 diagnosis on CT-scans with knowledge distillation of vision transformer

Yu Jie Lim; Kian Ming Lim; Roy Kwang Yang Chang; Chin Poo Lee

doi:10.1080/00051144.2024.2349416

KDViT: COVID-19 diagnosis on CT-scans with knowledge distillation of vision transformer

Yu Jie Lim, Kian Ming Lim^*, Roy Kwang Yang Chang, Chin Poo Lee

^*Corresponding author for this work

Multimedia University

Research output: Contribution to journal › Article › peer-review

Abstract

This paper introduces Knowledge Distillation of Vision Transformer (KDViT), a novel approach for medical image classification. The Vision Transformer architecture incorporates a self-attention mechanism to autonomously learn image structure. The input medical image is segmented into patches and transformed into low-dimensional linear embeddings. Position information is integrated into each patch, and a learnable classification token is appended for classification, thereby preserving spatial relationships within the image. The output vectors are then fed into a Transformer encoder to extract both local and global features, leveraging the inherent attention mechanism for robust feature extraction across diverse medical imaging scenarios. Furthermore, knowledge distillation is employed to enhance performance by transferring insights from a large teacher model to a small student model. This approach reduces the computational requirements of the larger model and improves overall effectiveness. Integrating knowledge distillation with two Vision Transformer models not only showcases the novelty of the proposed solution for medical image classification but also enhances model interpretability, reduces computational complexity, and improves generalization capabilities. The proposed KDViT model achieved high accuracy rates of 98.39%, 88.57%, and 99.15% on the SARS-CoV-2-CT, COVID-CT, and iCTCF datasets respectively, surpassing the performance of other state-of-the-art methods.

Original language	English
Pages (from-to)	1113-1126
Number of pages	14
Journal	Automatika
Volume	65
Issue number	3
DOIs	https://doi.org/10.1080/00051144.2024.2349416
Publication status	Published - 2024
Externally published	Yes

Keywords

COVID-19 image classification
CT scan images
knowledge distillation
Vision transformer

Access to Document

10.1080/00051144.2024.2349416

Cite this

Lim, Y. J., Lim, K. M., Chang, R. K. Y., & Lee, C. P. (2024). KDViT: COVID-19 diagnosis on CT-scans with knowledge distillation of vision transformer. Automatika, 65(3), 1113-1126. https://doi.org/10.1080/00051144.2024.2349416

@article{101167f169254283b30c79dd0079910f,

title = "KDViT: COVID-19 diagnosis on CT-scans with knowledge distillation of vision transformer",

abstract = "This paper introduces Knowledge Distillation of Vision Transformer (KDViT), a novel approach for medical image classification. The Vision Transformer architecture incorporates a self-attention mechanism to autonomously learn image structure. The input medical image is segmented into patches and transformed into low-dimensional linear embeddings. Position information is integrated into each patch, and a learnable classification token is appended for classification, thereby preserving spatial relationships within the image. The output vectors are then fed into a Transformer encoder to extract both local and global features, leveraging the inherent attention mechanism for robust feature extraction across diverse medical imaging scenarios. Furthermore, knowledge distillation is employed to enhance performance by transferring insights from a large teacher model to a small student model. This approach reduces the computational requirements of the larger model and improves overall effectiveness. Integrating knowledge distillation with two Vision Transformer models not only showcases the novelty of the proposed solution for medical image classification but also enhances model interpretability, reduces computational complexity, and improves generalization capabilities. The proposed KDViT model achieved high accuracy rates of 98.39%, 88.57%, and 99.15% on the SARS-CoV-2-CT, COVID-CT, and iCTCF datasets respectively, surpassing the performance of other state-of-the-art methods.",

keywords = "COVID-19 image classification, CT scan images, knowledge distillation, Vision transformer",

author = "Lim, {Yu Jie} and Lim, {Kian Ming} and Chang, {Roy Kwang Yang} and Lee, {Chin Poo}",

note = "Publisher Copyright: {\textcopyright} 2024 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.",

year = "2024",

doi = "10.1080/00051144.2024.2349416",

language = "English",

volume = "65",

pages = "1113--1126",

journal = "Automatika",

issn = "0005-1144",

number = "3",

}

TY - JOUR

T1 - KDViT: COVID-19 diagnosis on CT-scans with knowledge distillation of vision transformer

AU - Lim, Yu Jie

AU - Lim, Kian Ming

AU - Chang, Roy Kwang Yang

AU - Lee, Chin Poo

PY - 2024

Y1 - 2024

N2 - This paper introduces Knowledge Distillation of Vision Transformer (KDViT), a novel approach for medical image classification. The Vision Transformer architecture incorporates a self-attention mechanism to autonomously learn image structure. The input medical image is segmented into patches and transformed into low-dimensional linear embeddings. Position information is integrated into each patch, and a learnable classification token is appended for classification, thereby preserving spatial relationships within the image. The output vectors are then fed into a Transformer encoder to extract both local and global features, leveraging the inherent attention mechanism for robust feature extraction across diverse medical imaging scenarios. Furthermore, knowledge distillation is employed to enhance performance by transferring insights from a large teacher model to a small student model. This approach reduces the computational requirements of the larger model and improves overall effectiveness. Integrating knowledge distillation with two Vision Transformer models not only showcases the novelty of the proposed solution for medical image classification but also enhances model interpretability, reduces computational complexity, and improves generalization capabilities. The proposed KDViT model achieved high accuracy rates of 98.39%, 88.57%, and 99.15% on the SARS-CoV-2-CT, COVID-CT, and iCTCF datasets respectively, surpassing the performance of other state-of-the-art methods.

AB - This paper introduces Knowledge Distillation of Vision Transformer (KDViT), a novel approach for medical image classification. The Vision Transformer architecture incorporates a self-attention mechanism to autonomously learn image structure. The input medical image is segmented into patches and transformed into low-dimensional linear embeddings. Position information is integrated into each patch, and a learnable classification token is appended for classification, thereby preserving spatial relationships within the image. The output vectors are then fed into a Transformer encoder to extract both local and global features, leveraging the inherent attention mechanism for robust feature extraction across diverse medical imaging scenarios. Furthermore, knowledge distillation is employed to enhance performance by transferring insights from a large teacher model to a small student model. This approach reduces the computational requirements of the larger model and improves overall effectiveness. Integrating knowledge distillation with two Vision Transformer models not only showcases the novelty of the proposed solution for medical image classification but also enhances model interpretability, reduces computational complexity, and improves generalization capabilities. The proposed KDViT model achieved high accuracy rates of 98.39%, 88.57%, and 99.15% on the SARS-CoV-2-CT, COVID-CT, and iCTCF datasets respectively, surpassing the performance of other state-of-the-art methods.

KW - COVID-19 image classification

KW - CT scan images

KW - knowledge distillation

KW - Vision transformer

UR - http://www.scopus.com/inward/record.url?scp=85193268578&partnerID=8YFLogxK

U2 - 10.1080/00051144.2024.2349416

DO - 10.1080/00051144.2024.2349416

M3 - Article

AN - SCOPUS:85193268578

SN - 0005-1144

VL - 65

SP - 1113

EP - 1126

JO - Automatika

JF - Automatika

IS - 3

ER -

KDViT: COVID-19 diagnosis on CT-scans with knowledge distillation of vision transformer

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this