TY - GEN
T1 - Dual-Domain Feature-Guided Task Alignment for Enhanced Small Object Detection
AU - Guo, Fangrui
AU - Wu, Junwei
AU - Zhang, Quan
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Small object detection is a critical challenge in Unmanned Aerial Vehicles (UAVs) due to the limited pixel representation of small objects and the impact of successive pooling operations, which frequently results in the disappearance of small objects within intricate backgrounds. To tackle this issue, we propose the Small Object Enhancement Pyramid (SOEP) module, which first transforms feature representations (i.e., in the spatial domain) into the frequency domain to better capture small objects typically characterized by high-frequency components. These feature representations are then fused in the spatial domain using a frequency-based attention map, enhancing small object representations by integrating information from both complementary domains. Furthermore, we introduce a Task Aligned Head (TAH) that integrates classification and localization tasks interactively, reducing the misalignment that occurs when these tasks are learned independently, particularly in the context of small objects. Experimental results on the Visdrone dataset verify that our proposed method (D2FTA) outperforms the baseline method by 12.7%, 14.19% on mAP0.5 and mAP0.5:0.95.
AB - Small object detection is a critical challenge in Unmanned Aerial Vehicles (UAVs) due to the limited pixel representation of small objects and the impact of successive pooling operations, which frequently results in the disappearance of small objects within intricate backgrounds. To tackle this issue, we propose the Small Object Enhancement Pyramid (SOEP) module, which first transforms feature representations (i.e., in the spatial domain) into the frequency domain to better capture small objects typically characterized by high-frequency components. These feature representations are then fused in the spatial domain using a frequency-based attention map, enhancing small object representations by integrating information from both complementary domains. Furthermore, we introduce a Task Aligned Head (TAH) that integrates classification and localization tasks interactively, reducing the misalignment that occurs when these tasks are learned independently, particularly in the context of small objects. Experimental results on the Visdrone dataset verify that our proposed method (D2FTA) outperforms the baseline method by 12.7%, 14.19% on mAP0.5 and mAP0.5:0.95.
KW - frequency domain
KW - small object detection
KW - spatial domain
UR - http://www.scopus.com/inward/record.url?scp=105003877954&partnerID=8YFLogxK
U2 - 10.1109/ICASSP49660.2025.10888564
DO - 10.1109/ICASSP49660.2025.10888564
M3 - Conference Proceeding
AN - SCOPUS:105003877954
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
BT - 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings
A2 - Rao, Bhaskar D
A2 - Trancoso, Isabel
A2 - Sharma, Gaurav
A2 - Mehta, Neelesh B.
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
Y2 - 6 April 2025 through 11 April 2025
ER -