CasUNeXt: A Cascaded Transformer With Intra- and Inter-Scale Information for Medical Image Segmentation

Junding Sun; Xiaopeng Zheng; Xiaosheng Wu; Chaosheng Tang; Shuihua Wang; Yudong Zhang

doi:10.1002/ima.23184

CasUNeXt: A Cascaded Transformer With Intra- and Inter-Scale Information for Medical Image Segmentation

Junding Sun^*, Xiaopeng Zheng, Xiaosheng Wu, Chaosheng Tang, Shuihua Wang, Yudong Zhang^*

^*Corresponding author for this work

Xi'an Jiaotong-Liverpool University

Research output: Contribution to journal › Article › peer-review

Abstract

Due to the Transformer's ability to capture long-range dependencies through Self-Attention, it has shown immense potential in medical image segmentation. However, it lacks the capability to model local relationships between pixels. Therefore, many previous approaches embedded the Transformer into the CNN encoder. However, current methods often fall short in modeling the relationships between multi-scale features, specifically the spatial correspondence between features at different scales. This limitation can result in the ineffective capture of scale differences for each object and the loss of features for small targets. Furthermore, due to the high complexity of the Transformer, it is challenging to integrate local and global information within the same scale effectively. To address these limitations, we propose a novel backbone network called CasUNeXt, which features three appealing design elements: (1) We use the idea of cascade to redesign the way CNN and Transformer are combined to enhance modeling the unique interrelationships between multi-scale information. (2) We design a Cascaded Scale-wise Transformer Module capable of cross-scale interactions. It not only strengthens feature extraction within a single scale but also models interactions between different scales. (3) We overhaul the multi-head Channel Attention mechanism to enable it to model context information in feature maps from multiple perspectives within the channel dimension. These design features collectively enable CasUNeXt to better integrate local and global information and capture relationships between multi-scale features, thereby improving the performance of medical image segmentation. Through experimental comparisons on various benchmark datasets, our CasUNeXt method exhibits outstanding performance in medical image segmentation tasks, surpassing the current state-of-the-art methods.

Original language	English
Article number	e23184
Journal	International Journal of Imaging Systems and Technology
Volume	34
Issue number	5
DOIs	https://doi.org/10.1002/ima.23184
Publication status	Published - Sept 2024

Keywords

cascade
CNN
multi-scale features
transformer

Access to Document

10.1002/ima.23184

Cite this

@article{52e39e92a7f34e9c87b18a9cb6f47e2b,

title = "CasUNeXt: A Cascaded Transformer With Intra- and Inter-Scale Information for Medical Image Segmentation",

abstract = "Due to the Transformer's ability to capture long-range dependencies through Self-Attention, it has shown immense potential in medical image segmentation. However, it lacks the capability to model local relationships between pixels. Therefore, many previous approaches embedded the Transformer into the CNN encoder. However, current methods often fall short in modeling the relationships between multi-scale features, specifically the spatial correspondence between features at different scales. This limitation can result in the ineffective capture of scale differences for each object and the loss of features for small targets. Furthermore, due to the high complexity of the Transformer, it is challenging to integrate local and global information within the same scale effectively. To address these limitations, we propose a novel backbone network called CasUNeXt, which features three appealing design elements: (1) We use the idea of cascade to redesign the way CNN and Transformer are combined to enhance modeling the unique interrelationships between multi-scale information. (2) We design a Cascaded Scale-wise Transformer Module capable of cross-scale interactions. It not only strengthens feature extraction within a single scale but also models interactions between different scales. (3) We overhaul the multi-head Channel Attention mechanism to enable it to model context information in feature maps from multiple perspectives within the channel dimension. These design features collectively enable CasUNeXt to better integrate local and global information and capture relationships between multi-scale features, thereby improving the performance of medical image segmentation. Through experimental comparisons on various benchmark datasets, our CasUNeXt method exhibits outstanding performance in medical image segmentation tasks, surpassing the current state-of-the-art methods.",

keywords = "cascade, CNN, multi-scale features, transformer",

author = "Junding Sun and Xiaopeng Zheng and Xiaosheng Wu and Chaosheng Tang and Shuihua Wang and Yudong Zhang",

note = "Publisher Copyright: {\textcopyright} 2024 The Author(s). International Journal of Imaging Systems and Technology published by Wiley Periodicals LLC.",

year = "2024",

month = sep,

doi = "10.1002/ima.23184",

language = "English",

volume = "34",

journal = "International Journal of Imaging Systems and Technology",

issn = "0899-9457",

number = "5",

}

TY - JOUR

T1 - CasUNeXt

T2 - A Cascaded Transformer With Intra- and Inter-Scale Information for Medical Image Segmentation

AU - Sun, Junding

AU - Zheng, Xiaopeng

AU - Wu, Xiaosheng

AU - Tang, Chaosheng

AU - Wang, Shuihua

AU - Zhang, Yudong

PY - 2024/9

Y1 - 2024/9

N2 - Due to the Transformer's ability to capture long-range dependencies through Self-Attention, it has shown immense potential in medical image segmentation. However, it lacks the capability to model local relationships between pixels. Therefore, many previous approaches embedded the Transformer into the CNN encoder. However, current methods often fall short in modeling the relationships between multi-scale features, specifically the spatial correspondence between features at different scales. This limitation can result in the ineffective capture of scale differences for each object and the loss of features for small targets. Furthermore, due to the high complexity of the Transformer, it is challenging to integrate local and global information within the same scale effectively. To address these limitations, we propose a novel backbone network called CasUNeXt, which features three appealing design elements: (1) We use the idea of cascade to redesign the way CNN and Transformer are combined to enhance modeling the unique interrelationships between multi-scale information. (2) We design a Cascaded Scale-wise Transformer Module capable of cross-scale interactions. It not only strengthens feature extraction within a single scale but also models interactions between different scales. (3) We overhaul the multi-head Channel Attention mechanism to enable it to model context information in feature maps from multiple perspectives within the channel dimension. These design features collectively enable CasUNeXt to better integrate local and global information and capture relationships between multi-scale features, thereby improving the performance of medical image segmentation. Through experimental comparisons on various benchmark datasets, our CasUNeXt method exhibits outstanding performance in medical image segmentation tasks, surpassing the current state-of-the-art methods.

AB - Due to the Transformer's ability to capture long-range dependencies through Self-Attention, it has shown immense potential in medical image segmentation. However, it lacks the capability to model local relationships between pixels. Therefore, many previous approaches embedded the Transformer into the CNN encoder. However, current methods often fall short in modeling the relationships between multi-scale features, specifically the spatial correspondence between features at different scales. This limitation can result in the ineffective capture of scale differences for each object and the loss of features for small targets. Furthermore, due to the high complexity of the Transformer, it is challenging to integrate local and global information within the same scale effectively. To address these limitations, we propose a novel backbone network called CasUNeXt, which features three appealing design elements: (1) We use the idea of cascade to redesign the way CNN and Transformer are combined to enhance modeling the unique interrelationships between multi-scale information. (2) We design a Cascaded Scale-wise Transformer Module capable of cross-scale interactions. It not only strengthens feature extraction within a single scale but also models interactions between different scales. (3) We overhaul the multi-head Channel Attention mechanism to enable it to model context information in feature maps from multiple perspectives within the channel dimension. These design features collectively enable CasUNeXt to better integrate local and global information and capture relationships between multi-scale features, thereby improving the performance of medical image segmentation. Through experimental comparisons on various benchmark datasets, our CasUNeXt method exhibits outstanding performance in medical image segmentation tasks, surpassing the current state-of-the-art methods.

KW - cascade

KW - CNN

KW - multi-scale features

KW - transformer

UR - http://www.scopus.com/inward/record.url?scp=85204705531&partnerID=8YFLogxK

U2 - 10.1002/ima.23184

DO - 10.1002/ima.23184

M3 - Article

AN - SCOPUS:85204705531

SN - 0899-9457

VL - 34

JO - International Journal of Imaging Systems and Technology

JF - International Journal of Imaging Systems and Technology

IS - 5

M1 - e23184

ER -

CasUNeXt: A Cascaded Transformer With Intra- and Inter-Scale Information for Medical Image Segmentation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this