Leveraging Frequency-Guided Mixer and Target-Aware Attention for Ground-Based Cloud Detection

Chenyu Dong; Guanyi Li; Yixiao Gu; Junjie Zhang; Dan Zeng

doi:10.1109/LGRS.2024.3381755

Leveraging Frequency-Guided Mixer and Target-Aware Attention for Ground-Based Cloud Detection

Chenyu Dong, Guanyi Li, Yixiao Gu, Junjie Zhang, Dan Zeng^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

Compared to satellite imagery, ground-based cameras capture cloud data (ground-to-sky data) with higher temporal and spatial resolutions, providing more detailed cloud information. However, the spectral information available in ground-to-sky data is limited. Therefore, extracting features with strong discrimination from optical remote sensing images (ORSIs) is challenging. Currently, deep-learning-based cloud detection methods face two main challenges. First, although convolutional neural networks (CNNs) effectively extract high-frequency (HF) components from images through convolutions, they struggle to capture low-frequency (LF) components, which are capable of representing global features and target structures. Second, in ORSIs, the spectral characteristics of thin clouds and the sky are similar, making it difficult to distinguish cloud regions from the background. To address these challenges, we propose a network consisting of two main modules: the mixer module (MM) and the cloud-aware attention module (CAAM). The MM comprises an HF and an LF components extraction branch. The HF branch extracts local textures through max-pooling and parallel convolution operations. The LF branch captures long-range dependency by decomposing a large kernel convolution. It leverages the advantages of both convolution and self-attention to effectively capture global features. In addition, we introduce the CAAM, which quantifies images into histograms to separate clouds from the background and enhances the perception of clouds using an attention mechanism. We conducted experiments using both daytime and nighttime cloud image data from the SWINySeg dataset with mIoU reaching 88.93% and overall accuracy (OA) reaching 93.97%. The results demonstrate that our proposed method achieves promising performance compared to state-of-the-art cloud detection methods.

Original language	English
Article number	6008205
Pages (from-to)	1-5
Number of pages	5
Journal	IEEE Geoscience and Remote Sensing Letters
Volume	21
DOIs	https://doi.org/10.1109/LGRS.2024.3381755
Publication status	Published - 2024
Externally published	Yes

Keywords

Attention mechanism
cloud detection
deep learning

Access to Document

10.1109/LGRS.2024.3381755

Cite this

@article{27ff7d1554004f0ea703c45e428b8d53,

title = "Leveraging Frequency-Guided Mixer and Target-Aware Attention for Ground-Based Cloud Detection",

abstract = "Compared to satellite imagery, ground-based cameras capture cloud data (ground-to-sky data) with higher temporal and spatial resolutions, providing more detailed cloud information. However, the spectral information available in ground-to-sky data is limited. Therefore, extracting features with strong discrimination from optical remote sensing images (ORSIs) is challenging. Currently, deep-learning-based cloud detection methods face two main challenges. First, although convolutional neural networks (CNNs) effectively extract high-frequency (HF) components from images through convolutions, they struggle to capture low-frequency (LF) components, which are capable of representing global features and target structures. Second, in ORSIs, the spectral characteristics of thin clouds and the sky are similar, making it difficult to distinguish cloud regions from the background. To address these challenges, we propose a network consisting of two main modules: the mixer module (MM) and the cloud-aware attention module (CAAM). The MM comprises an HF and an LF components extraction branch. The HF branch extracts local textures through max-pooling and parallel convolution operations. The LF branch captures long-range dependency by decomposing a large kernel convolution. It leverages the advantages of both convolution and self-attention to effectively capture global features. In addition, we introduce the CAAM, which quantifies images into histograms to separate clouds from the background and enhances the perception of clouds using an attention mechanism. We conducted experiments using both daytime and nighttime cloud image data from the SWINySeg dataset with mIoU reaching 88.93% and overall accuracy (OA) reaching 93.97%. The results demonstrate that our proposed method achieves promising performance compared to state-of-the-art cloud detection methods.",

keywords = "Attention mechanism, cloud detection, deep learning",

author = "Chenyu Dong and Guanyi Li and Yixiao Gu and Junjie Zhang and Dan Zeng",

note = "Publisher Copyright: {\textcopyright} 2004-2012 IEEE.",

year = "2024",

doi = "10.1109/LGRS.2024.3381755",

language = "English",

volume = "21",

pages = "1--5",

journal = "IEEE Geoscience and Remote Sensing Letters",

issn = "1545-598X",

}

TY - JOUR

T1 - Leveraging Frequency-Guided Mixer and Target-Aware Attention for Ground-Based Cloud Detection

AU - Dong, Chenyu

AU - Li, Guanyi

AU - Gu, Yixiao

AU - Zhang, Junjie

AU - Zeng, Dan

PY - 2024

Y1 - 2024

N2 - Compared to satellite imagery, ground-based cameras capture cloud data (ground-to-sky data) with higher temporal and spatial resolutions, providing more detailed cloud information. However, the spectral information available in ground-to-sky data is limited. Therefore, extracting features with strong discrimination from optical remote sensing images (ORSIs) is challenging. Currently, deep-learning-based cloud detection methods face two main challenges. First, although convolutional neural networks (CNNs) effectively extract high-frequency (HF) components from images through convolutions, they struggle to capture low-frequency (LF) components, which are capable of representing global features and target structures. Second, in ORSIs, the spectral characteristics of thin clouds and the sky are similar, making it difficult to distinguish cloud regions from the background. To address these challenges, we propose a network consisting of two main modules: the mixer module (MM) and the cloud-aware attention module (CAAM). The MM comprises an HF and an LF components extraction branch. The HF branch extracts local textures through max-pooling and parallel convolution operations. The LF branch captures long-range dependency by decomposing a large kernel convolution. It leverages the advantages of both convolution and self-attention to effectively capture global features. In addition, we introduce the CAAM, which quantifies images into histograms to separate clouds from the background and enhances the perception of clouds using an attention mechanism. We conducted experiments using both daytime and nighttime cloud image data from the SWINySeg dataset with mIoU reaching 88.93% and overall accuracy (OA) reaching 93.97%. The results demonstrate that our proposed method achieves promising performance compared to state-of-the-art cloud detection methods.

AB - Compared to satellite imagery, ground-based cameras capture cloud data (ground-to-sky data) with higher temporal and spatial resolutions, providing more detailed cloud information. However, the spectral information available in ground-to-sky data is limited. Therefore, extracting features with strong discrimination from optical remote sensing images (ORSIs) is challenging. Currently, deep-learning-based cloud detection methods face two main challenges. First, although convolutional neural networks (CNNs) effectively extract high-frequency (HF) components from images through convolutions, they struggle to capture low-frequency (LF) components, which are capable of representing global features and target structures. Second, in ORSIs, the spectral characteristics of thin clouds and the sky are similar, making it difficult to distinguish cloud regions from the background. To address these challenges, we propose a network consisting of two main modules: the mixer module (MM) and the cloud-aware attention module (CAAM). The MM comprises an HF and an LF components extraction branch. The HF branch extracts local textures through max-pooling and parallel convolution operations. The LF branch captures long-range dependency by decomposing a large kernel convolution. It leverages the advantages of both convolution and self-attention to effectively capture global features. In addition, we introduce the CAAM, which quantifies images into histograms to separate clouds from the background and enhances the perception of clouds using an attention mechanism. We conducted experiments using both daytime and nighttime cloud image data from the SWINySeg dataset with mIoU reaching 88.93% and overall accuracy (OA) reaching 93.97%. The results demonstrate that our proposed method achieves promising performance compared to state-of-the-art cloud detection methods.

KW - Attention mechanism

KW - cloud detection

KW - deep learning

UR - http://www.scopus.com/inward/record.url?scp=85189373863&partnerID=8YFLogxK

U2 - 10.1109/LGRS.2024.3381755

DO - 10.1109/LGRS.2024.3381755

M3 - Article

AN - SCOPUS:85189373863

SN - 1545-598X

VL - 21

SP - 1

EP - 5

JO - IEEE Geoscience and Remote Sensing Letters

JF - IEEE Geoscience and Remote Sensing Letters

M1 - 6008205

ER -

Leveraging Frequency-Guided Mixer and Target-Aware Attention for Ground-Based Cloud Detection

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this