Hybrid Multi-Class Token Vision Transformer Convolutional Network for DOA Estimation

Yuxuan Xie, Aifei Liu*, Xinyu Lu, Dufei Chong

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

In this letter, we propose an efficient hybrid model, named HMC-ViT, that combines a convolutional neural network (CNN) with a multi-class token vision transformer (ViT) to address the problem of direction of arrival (DOA) estimation. HMC-ViT integrates the local feature extraction capability of CNN with the global feature extraction capability of ViT to enhance DOA estimation performance and improve the computational efficiency of ViT. Additionally, the ViT component employs multiple class tokens in parallel to generate spatial spectra for sub-regions, further enhancing the model's performance. Simulation results demonstrate that the proposed method outperforms existing approaches under low signal-to-noise ratio (SNR) scenarios.

Original languageEnglish
Pages (from-to)2279-2283
Number of pages5
JournalIEEE Signal Processing Letters
Volume32
DOIs
Publication statusPublished - 2025

Keywords

  • Convolutional neural network (CNN)
  • deep learning
  • direction of arrival (DOA) estimation
  • vision transformer (ViT)

Fingerprint

Dive into the research topics of 'Hybrid Multi-Class Token Vision Transformer Convolutional Network for DOA Estimation'. Together they form a unique fingerprint.

Cite this