TY - GEN
T1 - Bilateral-ViT For Robust Fovea Localization
AU - Song, Sifan
AU - Dang, Kang
AU - Yu, Qinji
AU - Wang, Zilong
AU - Coenen, Frans
AU - Su, Jionglong
AU - Ding, Xiaowei
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - The fovea is an important anatomical landmark of the retina. Detecting the location of the fovea is essential for the analysis of many retinal diseases. However, robust fovea localization remains a challenging problem, as the fovea region often appears fuzzy, and retina diseases may further obscure its appearance. This paper proposes a novel Vision Transformer (ViT) approach that integrates information both inside and outside the fovea region to achieve robust fovea localization. Our proposed network, named Bilateral-Vision-Transformer (Bilateral-ViT), consists of two network branches: a transformer-based main network branch for integrating global context across the entire fundus image and a vessel branch for explicitly incorporating the structure of blood vessels. The encoded features from both network branches are subsequently merged with a customized Multi-scale Feature Fusion (MFF) module. Our comprehensive experiments demonstrate that the proposed approach is significantly more robust for diseased images and establishes the new state of the arts using the Messidor and PALM datasets.
AB - The fovea is an important anatomical landmark of the retina. Detecting the location of the fovea is essential for the analysis of many retinal diseases. However, robust fovea localization remains a challenging problem, as the fovea region often appears fuzzy, and retina diseases may further obscure its appearance. This paper proposes a novel Vision Transformer (ViT) approach that integrates information both inside and outside the fovea region to achieve robust fovea localization. Our proposed network, named Bilateral-Vision-Transformer (Bilateral-ViT), consists of two network branches: a transformer-based main network branch for integrating global context across the entire fundus image and a vessel branch for explicitly incorporating the structure of blood vessels. The encoded features from both network branches are subsequently merged with a customized Multi-scale Feature Fusion (MFF) module. Our comprehensive experiments demonstrate that the proposed approach is significantly more robust for diseased images and establishes the new state of the arts using the Messidor and PALM datasets.
KW - Bilateral Neural Network
KW - Feature Fusion
KW - Fovea Localization
KW - Vision Transformer
UR - http://www.scopus.com/inward/record.url?scp=85129647739&partnerID=8YFLogxK
U2 - 10.1109/ISBI52829.2022.9761523
DO - 10.1109/ISBI52829.2022.9761523
M3 - Conference Proceeding
AN - SCOPUS:85129647739
T3 - Proceedings - International Symposium on Biomedical Imaging
BT - ISBI 2022 - Proceedings
PB - IEEE Computer Society
T2 - 19th IEEE International Symposium on Biomedical Imaging, ISBI 2022
Y2 - 28 March 2022 through 31 March 2022
ER -