Multimodal Frequeny Spectrum Fusion Schema for RGB-T Image Semantic Segmentation

Hengyan Liu; Wenzhang Zhang; Tianhong Dai; Longfei Yin; Guangyu Ren

doi:10.1109/ICCCN61486.2024.10637614

Multimodal Frequeny Spectrum Fusion Schema for RGB-T Image Semantic Segmentation

Hengyan Liu^*, Wenzhang Zhang, Tianhong Dai, Longfei Yin, Guangyu Ren

^*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

2 Citations (Scopus)

Abstract

Semantic segmentation confronts challenges with traditional networks tailored exclusively for RGB inputs, which may suffer from quality degradation under adverse conditions like low-level illumination or inclement weather. Recent advancements have shown promising outcomes by integrating RGB images with corresponding thermal infrared (TIR) images. However, effectively fusing features from both modalities remains a significant challenge. In this paper, we introduce a novel approach termed Multimodal Frequency Spectrum Fusion Schema (MFSFS) for semantic segmentation of RGB-T images. MFSFS leverages the advantages of the frequency spectrum to effectively extract and utilize multimodal feature information. To mitigate redundant information's adverse effects during multimodal fusion in the frequency domain, we propose a diversity-oriented contrastive learning approach. Simulation results demonstrate that MFSFS achieves competitive performance while maintaining a relatively smaller model size.

Original language	English
Title of host publication	ICCCN 2024 - 2024 33rd International Conference on Computer Communications and Networks
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9798350384611
DOIs	https://doi.org/10.1109/ICCCN61486.2024.10637614
Publication status	Published - 2024
Event	33rd International Conference on Computer Communications and Networks, ICCCN 2024 - Big Island, United States Duration: 29 Jul 2024 → 31 Jul 2024

Publication series

Name	Proceedings - International Conference on Computer Communications and Networks, ICCCN
ISSN (Print)	1095-2055

Conference

Conference	33rd International Conference on Computer Communications and Networks, ICCCN 2024
Country/Territory	United States
City	Big Island
Period	29/07/24 → 31/07/24

Keywords

Contrastive Learning
Determinantal point processes
Frequency Spectrum
Multimodal Fusion
Semantic Segmentation

Access to Document

10.1109/ICCCN61486.2024.10637614

Cite this

Liu, H., Zhang, W., Dai, T., Yin, L., & Ren, G. (2024). Multimodal Frequeny Spectrum Fusion Schema for RGB-T Image Semantic Segmentation. In ICCCN 2024 - 2024 33rd International Conference on Computer Communications and Networks (Proceedings - International Conference on Computer Communications and Networks, ICCCN). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCCN61486.2024.10637614

Liu, Hengyan ; Zhang, Wenzhang ; Dai, Tianhong et al. / Multimodal Frequeny Spectrum Fusion Schema for RGB-T Image Semantic Segmentation. ICCCN 2024 - 2024 33rd International Conference on Computer Communications and Networks. Institute of Electrical and Electronics Engineers Inc., 2024. (Proceedings - International Conference on Computer Communications and Networks, ICCCN).

@inproceedings{d8be617096a645a39e1427e3a43aee8b,

title = "Multimodal Frequeny Spectrum Fusion Schema for RGB-T Image Semantic Segmentation",

abstract = "Semantic segmentation confronts challenges with traditional networks tailored exclusively for RGB inputs, which may suffer from quality degradation under adverse conditions like low-level illumination or inclement weather. Recent advancements have shown promising outcomes by integrating RGB images with corresponding thermal infrared (TIR) images. However, effectively fusing features from both modalities remains a significant challenge. In this paper, we introduce a novel approach termed Multimodal Frequency Spectrum Fusion Schema (MFSFS) for semantic segmentation of RGB-T images. MFSFS leverages the advantages of the frequency spectrum to effectively extract and utilize multimodal feature information. To mitigate redundant information's adverse effects during multimodal fusion in the frequency domain, we propose a diversity-oriented contrastive learning approach. Simulation results demonstrate that MFSFS achieves competitive performance while maintaining a relatively smaller model size.",

keywords = "Contrastive Learning, Determinantal point processes, Frequency Spectrum, Multimodal Fusion, Semantic Segmentation",

author = "Hengyan Liu and Wenzhang Zhang and Tianhong Dai and Longfei Yin and Guangyu Ren",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 33rd International Conference on Computer Communications and Networks, ICCCN 2024 ; Conference date: 29-07-2024 Through 31-07-2024",

year = "2024",

doi = "10.1109/ICCCN61486.2024.10637614",

language = "English",

series = "Proceedings - International Conference on Computer Communications and Networks, ICCCN",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "ICCCN 2024 - 2024 33rd International Conference on Computer Communications and Networks",

}

Liu, H , Zhang, W, Dai, T, Yin, L & Ren, G 2024, Multimodal Frequeny Spectrum Fusion Schema for RGB-T Image Semantic Segmentation. in ICCCN 2024 - 2024 33rd International Conference on Computer Communications and Networks. Proceedings - International Conference on Computer Communications and Networks, ICCCN, Institute of Electrical and Electronics Engineers Inc., 33rd International Conference on Computer Communications and Networks, ICCCN 2024, Big Island, United States, 29/07/24. https://doi.org/10.1109/ICCCN61486.2024.10637614

Multimodal Frequeny Spectrum Fusion Schema for RGB-T Image Semantic Segmentation. / Liu, Hengyan ; Zhang, Wenzhang; Dai, Tianhong et al.
ICCCN 2024 - 2024 33rd International Conference on Computer Communications and Networks. Institute of Electrical and Electronics Engineers Inc., 2024. (Proceedings - International Conference on Computer Communications and Networks, ICCCN).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Multimodal Frequeny Spectrum Fusion Schema for RGB-T Image Semantic Segmentation

AU - Liu, Hengyan

AU - Zhang, Wenzhang

AU - Dai, Tianhong

AU - Yin, Longfei

AU - Ren, Guangyu

PY - 2024

Y1 - 2024

N2 - Semantic segmentation confronts challenges with traditional networks tailored exclusively for RGB inputs, which may suffer from quality degradation under adverse conditions like low-level illumination or inclement weather. Recent advancements have shown promising outcomes by integrating RGB images with corresponding thermal infrared (TIR) images. However, effectively fusing features from both modalities remains a significant challenge. In this paper, we introduce a novel approach termed Multimodal Frequency Spectrum Fusion Schema (MFSFS) for semantic segmentation of RGB-T images. MFSFS leverages the advantages of the frequency spectrum to effectively extract and utilize multimodal feature information. To mitigate redundant information's adverse effects during multimodal fusion in the frequency domain, we propose a diversity-oriented contrastive learning approach. Simulation results demonstrate that MFSFS achieves competitive performance while maintaining a relatively smaller model size.

AB - Semantic segmentation confronts challenges with traditional networks tailored exclusively for RGB inputs, which may suffer from quality degradation under adverse conditions like low-level illumination or inclement weather. Recent advancements have shown promising outcomes by integrating RGB images with corresponding thermal infrared (TIR) images. However, effectively fusing features from both modalities remains a significant challenge. In this paper, we introduce a novel approach termed Multimodal Frequency Spectrum Fusion Schema (MFSFS) for semantic segmentation of RGB-T images. MFSFS leverages the advantages of the frequency spectrum to effectively extract and utilize multimodal feature information. To mitigate redundant information's adverse effects during multimodal fusion in the frequency domain, we propose a diversity-oriented contrastive learning approach. Simulation results demonstrate that MFSFS achieves competitive performance while maintaining a relatively smaller model size.

KW - Contrastive Learning

KW - Determinantal point processes

KW - Frequency Spectrum

KW - Multimodal Fusion

KW - Semantic Segmentation

UR - http://www.scopus.com/inward/record.url?scp=85203269718&partnerID=8YFLogxK

U2 - 10.1109/ICCCN61486.2024.10637614

DO - 10.1109/ICCCN61486.2024.10637614

M3 - Conference Proceeding

AN - SCOPUS:85203269718

T3 - Proceedings - International Conference on Computer Communications and Networks, ICCCN

BT - ICCCN 2024 - 2024 33rd International Conference on Computer Communications and Networks

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 33rd International Conference on Computer Communications and Networks, ICCCN 2024

Y2 - 29 July 2024 through 31 July 2024

ER -

Liu H , Zhang W, Dai T, Yin L, Ren G. Multimodal Frequeny Spectrum Fusion Schema for RGB-T Image Semantic Segmentation. In ICCCN 2024 - 2024 33rd International Conference on Computer Communications and Networks. Institute of Electrical and Electronics Engineers Inc. 2024. (Proceedings - International Conference on Computer Communications and Networks, ICCCN). doi: 10.1109/ICCCN61486.2024.10637614

Multimodal Frequeny Spectrum Fusion Schema for RGB-T Image Semantic Segmentation

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this