A visual knowledge oriented approach for weakly supervised remote sensing object detection

Junjie Zhang; Binfeng Ye; Qiming Zhang; Yongshun Gong; Jianfeng Lu; Dan Zeng

doi:10.1016/j.neucom.2024.128114

A visual knowledge oriented approach for weakly supervised remote sensing object detection

Junjie Zhang, Binfeng Ye, Qiming Zhang, Yongshun Gong, Jianfeng Lu, Dan Zeng^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

2 Citations (Scopus)

Abstract

Weakly supervised learning poses significant challenges in remote sensing (RS) object detection due to the lack of precise instance annotations. This issue becomes particularly pronounced when dealing with complex backgrounds and dense target alignments in RS images. To address above limitations, we propose a visual knowledge oriented approach to leverage visual cues as pseudo labels, thereby enhancing the supervision for object detection. The visual knowledge is mainly explored from two perspectives: Firstly, recognizing that annotations are made solely at the image level, we address this limitation by aggregating objects of the same type across a group of images that share related semantic concepts. This approach allows us to infer instance-level annotations through collective knowledge. Secondly, due to the bird's-eye view of RS images, certain object categories display distinctive visual patterns that are identifiable via expert knowledge. Specifically, with the multi-instance self-training framework as our base model, we establish the correlation among images sharing the same class labels, the co-saliency is utilized to extract the regions of common interests, thereby obtaining initial foregrounds in each image. Moreover, by leveraging the expert knowledge of class-specific visual patterns, we refine the pseudo labels and strength the foreground feature extraction by incorporating the low-level visual cues. To further stabilize the training process and address potential noise in object proposals, we incorporate a two-stage training strategy to refine initial predictions. We validate the effectiveness of our proposed approach on two benchmark datasets, i.e. NWPU VHR-10.v2 and DIOR, and achieve mAP of 84.25% and 27.5% on these datasets, respectively, which significantly outperform trending methods.

Original language	English
Article number	128114
Journal	Neurocomputing
Volume	597
DOIs	https://doi.org/10.1016/j.neucom.2024.128114
Publication status	Published - 7 Sept 2024
Externally published	Yes

Keywords

Co-saliency segmentation
Expert knowledge
Remote sensing images
Visual knowledge
Weakly-supervised learning

Access to Document

10.1016/j.neucom.2024.128114

Cite this

@article{a5e15cad19fa488699bc1f8cbe74aa37,

title = "A visual knowledge oriented approach for weakly supervised remote sensing object detection",

abstract = "Weakly supervised learning poses significant challenges in remote sensing (RS) object detection due to the lack of precise instance annotations. This issue becomes particularly pronounced when dealing with complex backgrounds and dense target alignments in RS images. To address above limitations, we propose a visual knowledge oriented approach to leverage visual cues as pseudo labels, thereby enhancing the supervision for object detection. The visual knowledge is mainly explored from two perspectives: Firstly, recognizing that annotations are made solely at the image level, we address this limitation by aggregating objects of the same type across a group of images that share related semantic concepts. This approach allows us to infer instance-level annotations through collective knowledge. Secondly, due to the bird's-eye view of RS images, certain object categories display distinctive visual patterns that are identifiable via expert knowledge. Specifically, with the multi-instance self-training framework as our base model, we establish the correlation among images sharing the same class labels, the co-saliency is utilized to extract the regions of common interests, thereby obtaining initial foregrounds in each image. Moreover, by leveraging the expert knowledge of class-specific visual patterns, we refine the pseudo labels and strength the foreground feature extraction by incorporating the low-level visual cues. To further stabilize the training process and address potential noise in object proposals, we incorporate a two-stage training strategy to refine initial predictions. We validate the effectiveness of our proposed approach on two benchmark datasets, i.e. NWPU VHR-10.v2 and DIOR, and achieve mAP of 84.25% and 27.5% on these datasets, respectively, which significantly outperform trending methods.",

keywords = "Co-saliency segmentation, Expert knowledge, Remote sensing images, Visual knowledge, Weakly-supervised learning",

author = "Junjie Zhang and Binfeng Ye and Qiming Zhang and Yongshun Gong and Jianfeng Lu and Dan Zeng",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier B.V.",

year = "2024",

month = sep,

day = "7",

doi = "10.1016/j.neucom.2024.128114",

language = "English",

volume = "597",

journal = "Neurocomputing",

issn = "0925-2312",

}

TY - JOUR

T1 - A visual knowledge oriented approach for weakly supervised remote sensing object detection

AU - Zhang, Junjie

AU - Ye, Binfeng

AU - Zhang, Qiming

AU - Gong, Yongshun

AU - Lu, Jianfeng

AU - Zeng, Dan

PY - 2024/9/7

Y1 - 2024/9/7

N2 - Weakly supervised learning poses significant challenges in remote sensing (RS) object detection due to the lack of precise instance annotations. This issue becomes particularly pronounced when dealing with complex backgrounds and dense target alignments in RS images. To address above limitations, we propose a visual knowledge oriented approach to leverage visual cues as pseudo labels, thereby enhancing the supervision for object detection. The visual knowledge is mainly explored from two perspectives: Firstly, recognizing that annotations are made solely at the image level, we address this limitation by aggregating objects of the same type across a group of images that share related semantic concepts. This approach allows us to infer instance-level annotations through collective knowledge. Secondly, due to the bird's-eye view of RS images, certain object categories display distinctive visual patterns that are identifiable via expert knowledge. Specifically, with the multi-instance self-training framework as our base model, we establish the correlation among images sharing the same class labels, the co-saliency is utilized to extract the regions of common interests, thereby obtaining initial foregrounds in each image. Moreover, by leveraging the expert knowledge of class-specific visual patterns, we refine the pseudo labels and strength the foreground feature extraction by incorporating the low-level visual cues. To further stabilize the training process and address potential noise in object proposals, we incorporate a two-stage training strategy to refine initial predictions. We validate the effectiveness of our proposed approach on two benchmark datasets, i.e. NWPU VHR-10.v2 and DIOR, and achieve mAP of 84.25% and 27.5% on these datasets, respectively, which significantly outperform trending methods.

AB - Weakly supervised learning poses significant challenges in remote sensing (RS) object detection due to the lack of precise instance annotations. This issue becomes particularly pronounced when dealing with complex backgrounds and dense target alignments in RS images. To address above limitations, we propose a visual knowledge oriented approach to leverage visual cues as pseudo labels, thereby enhancing the supervision for object detection. The visual knowledge is mainly explored from two perspectives: Firstly, recognizing that annotations are made solely at the image level, we address this limitation by aggregating objects of the same type across a group of images that share related semantic concepts. This approach allows us to infer instance-level annotations through collective knowledge. Secondly, due to the bird's-eye view of RS images, certain object categories display distinctive visual patterns that are identifiable via expert knowledge. Specifically, with the multi-instance self-training framework as our base model, we establish the correlation among images sharing the same class labels, the co-saliency is utilized to extract the regions of common interests, thereby obtaining initial foregrounds in each image. Moreover, by leveraging the expert knowledge of class-specific visual patterns, we refine the pseudo labels and strength the foreground feature extraction by incorporating the low-level visual cues. To further stabilize the training process and address potential noise in object proposals, we incorporate a two-stage training strategy to refine initial predictions. We validate the effectiveness of our proposed approach on two benchmark datasets, i.e. NWPU VHR-10.v2 and DIOR, and achieve mAP of 84.25% and 27.5% on these datasets, respectively, which significantly outperform trending methods.

KW - Co-saliency segmentation

KW - Expert knowledge

KW - Remote sensing images

KW - Visual knowledge

KW - Weakly-supervised learning

UR - http://www.scopus.com/inward/record.url?scp=85197385237&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2024.128114

DO - 10.1016/j.neucom.2024.128114

M3 - Article

AN - SCOPUS:85197385237

SN - 0925-2312

VL - 597

JO - Neurocomputing

JF - Neurocomputing

M1 - 128114

ER -

A visual knowledge oriented approach for weakly supervised remote sensing object detection

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this