Abstract
Transformers have been widely applied to hyperspectral image classification, leveraging their self-attention mechanism for powerful global modelling. However, two key challenges remain as follows: excessive memory and computational costs from calculating correlations between all tokens (especially as image size or spectral bands increase) and limited ability to model local boundary information due to lacking explicit enhancement mechanisms. This paper proposes a novel method, bridge transformer network fused with deep graph convolution (BTDGC), to address these issues. The framework includes three components as follows: a double random masking mechanism (DRMM) that forces the model to infer masked features from context during training, a bridge transformer (BT) module with bridge tokens for cross-region feature interaction and a Deep Graph Convolutional Pooling (DGCP) module that preserves spatial topology while aggregating hierarchical information. Experiments on standard hyperspectral datasets show BTDGC outperforms mainstream methods in classification accuracy and robustness, effectively balancing global modelling and local boundary representation. The code is available at https://github.com/jenny3489/BTDGC.
| Original language | English |
|---|---|
| Journal | CAAI Transactions on Intelligence Technology |
| Early online date | 28 Nov 2025 |
| DOIs | |
| Publication status | Published - 2026 |
Keywords
- convolution
- graph convolutional network
- masking mechanism
- transforms