Multi-view contextual adaptation network for weakly supervised object detection in remote sensing images

Binfeng Ye; Junjie Zhang; Yutao Rao; Rui Gao; Dan Zeng

doi:10.1080/01431161.2024.2359734

Multi-view contextual adaptation network for weakly supervised object detection in remote sensing images

Binfeng Ye, Junjie Zhang^*, Yutao Rao, Rui Gao, Dan Zeng

^*Corresponding author for this work

Shanghai University

Research output: Contribution to journal › Article › peer-review

Abstract

Weakly supervised learning plays a pivotal role in the field of object detection, i.e. Weakly supervised object detection (WSOD), significantly reducing annotation costs relying on image-level labels. However, WSOD exhibits certain limitations. Typically, they tend to identify the most easily recognizable local regions within targets, posing challenges in accurately delineating the boundaries of targets. Moreover, the presence of multiple instances of the same class in adjacent locations complicates the effective distinction between multiple objects within the same category. On the other hand, the complex backgrounds and dense distribution of targets in remote sensing images (RSI) further exacerbate the difficulty of weakly supervised detection. To address the above issues, we propose a model termed the Multi-View Contextual Adaptation Network (VCANet). Building on the classic Online Instance Classifier Refinement (OICR) framework, we propose to incorporate an contextual adaptation perception, within a multi-view learning framework, and integrate a pseudo-label filtering process. The contextual adaptation perception utilizes the surrounding environment information to enhance localization capabilities, guiding the model to prioritize target objects by referring to their spatially neighbouring pixels. Multi-view learning manufactures additional constraints from diverse perspectives, thereby revealing objects that might be overlooked due to the weak supervision in a single view. The pseudo-label filtering process eliminates inaccurate pseudo-labels by identifying reliable foregrounds to mitigate overlapping proposals during the label propagation. On challenging datasets NWPU VHR-10.v2 and DIOR, we achieve promising results with mAP of 62.3% and 28.2%, respectively, surpassing existing benchmarks.

Original language	English
Pages (from-to)	4344-4366
Number of pages	23
Journal	International Journal of Remote Sensing
Volume	45
Issue number	13
DOIs	https://doi.org/10.1080/01431161.2024.2359734
Publication status	Published - 2024
Externally published	Yes

Keywords

contextual adaptation
multi-view learning
remote sensing image
Weakly supervised object detection

Access to Document

10.1080/01431161.2024.2359734

Cite this

@article{8a3b858d445d4927934ff2e4bb719ebd,

title = "Multi-view contextual adaptation network for weakly supervised object detection in remote sensing images",

abstract = "Weakly supervised learning plays a pivotal role in the field of object detection, i.e. Weakly supervised object detection (WSOD), significantly reducing annotation costs relying on image-level labels. However, WSOD exhibits certain limitations. Typically, they tend to identify the most easily recognizable local regions within targets, posing challenges in accurately delineating the boundaries of targets. Moreover, the presence of multiple instances of the same class in adjacent locations complicates the effective distinction between multiple objects within the same category. On the other hand, the complex backgrounds and dense distribution of targets in remote sensing images (RSI) further exacerbate the difficulty of weakly supervised detection. To address the above issues, we propose a model termed the Multi-View Contextual Adaptation Network (VCANet). Building on the classic Online Instance Classifier Refinement (OICR) framework, we propose to incorporate an contextual adaptation perception, within a multi-view learning framework, and integrate a pseudo-label filtering process. The contextual adaptation perception utilizes the surrounding environment information to enhance localization capabilities, guiding the model to prioritize target objects by referring to their spatially neighbouring pixels. Multi-view learning manufactures additional constraints from diverse perspectives, thereby revealing objects that might be overlooked due to the weak supervision in a single view. The pseudo-label filtering process eliminates inaccurate pseudo-labels by identifying reliable foregrounds to mitigate overlapping proposals during the label propagation. On challenging datasets NWPU VHR-10.v2 and DIOR, we achieve promising results with mAP of 62.3% and 28.2%, respectively, surpassing existing benchmarks.",

keywords = "contextual adaptation, multi-view learning, remote sensing image, Weakly supervised object detection",

author = "Binfeng Ye and Junjie Zhang and Yutao Rao and Rui Gao and Dan Zeng",

note = "Publisher Copyright: {\textcopyright} 2024 Informa UK Limited, trading as Taylor & Francis Group.",

year = "2024",

doi = "10.1080/01431161.2024.2359734",

language = "English",

volume = "45",

pages = "4344--4366",

journal = "International Journal of Remote Sensing",

issn = "0143-1161",

publisher = "Taylor & Francis",

number = "13",

}

TY - JOUR

T1 - Multi-view contextual adaptation network for weakly supervised object detection in remote sensing images

AU - Ye, Binfeng

AU - Zhang, Junjie

AU - Rao, Yutao

AU - Gao, Rui

AU - Zeng, Dan

PY - 2024

Y1 - 2024

N2 - Weakly supervised learning plays a pivotal role in the field of object detection, i.e. Weakly supervised object detection (WSOD), significantly reducing annotation costs relying on image-level labels. However, WSOD exhibits certain limitations. Typically, they tend to identify the most easily recognizable local regions within targets, posing challenges in accurately delineating the boundaries of targets. Moreover, the presence of multiple instances of the same class in adjacent locations complicates the effective distinction between multiple objects within the same category. On the other hand, the complex backgrounds and dense distribution of targets in remote sensing images (RSI) further exacerbate the difficulty of weakly supervised detection. To address the above issues, we propose a model termed the Multi-View Contextual Adaptation Network (VCANet). Building on the classic Online Instance Classifier Refinement (OICR) framework, we propose to incorporate an contextual adaptation perception, within a multi-view learning framework, and integrate a pseudo-label filtering process. The contextual adaptation perception utilizes the surrounding environment information to enhance localization capabilities, guiding the model to prioritize target objects by referring to their spatially neighbouring pixels. Multi-view learning manufactures additional constraints from diverse perspectives, thereby revealing objects that might be overlooked due to the weak supervision in a single view. The pseudo-label filtering process eliminates inaccurate pseudo-labels by identifying reliable foregrounds to mitigate overlapping proposals during the label propagation. On challenging datasets NWPU VHR-10.v2 and DIOR, we achieve promising results with mAP of 62.3% and 28.2%, respectively, surpassing existing benchmarks.

AB - Weakly supervised learning plays a pivotal role in the field of object detection, i.e. Weakly supervised object detection (WSOD), significantly reducing annotation costs relying on image-level labels. However, WSOD exhibits certain limitations. Typically, they tend to identify the most easily recognizable local regions within targets, posing challenges in accurately delineating the boundaries of targets. Moreover, the presence of multiple instances of the same class in adjacent locations complicates the effective distinction between multiple objects within the same category. On the other hand, the complex backgrounds and dense distribution of targets in remote sensing images (RSI) further exacerbate the difficulty of weakly supervised detection. To address the above issues, we propose a model termed the Multi-View Contextual Adaptation Network (VCANet). Building on the classic Online Instance Classifier Refinement (OICR) framework, we propose to incorporate an contextual adaptation perception, within a multi-view learning framework, and integrate a pseudo-label filtering process. The contextual adaptation perception utilizes the surrounding environment information to enhance localization capabilities, guiding the model to prioritize target objects by referring to their spatially neighbouring pixels. Multi-view learning manufactures additional constraints from diverse perspectives, thereby revealing objects that might be overlooked due to the weak supervision in a single view. The pseudo-label filtering process eliminates inaccurate pseudo-labels by identifying reliable foregrounds to mitigate overlapping proposals during the label propagation. On challenging datasets NWPU VHR-10.v2 and DIOR, we achieve promising results with mAP of 62.3% and 28.2%, respectively, surpassing existing benchmarks.

KW - contextual adaptation

KW - multi-view learning

KW - remote sensing image

KW - Weakly supervised object detection

UR - http://www.scopus.com/inward/record.url?scp=85196537988&partnerID=8YFLogxK

U2 - 10.1080/01431161.2024.2359734

DO - 10.1080/01431161.2024.2359734

M3 - Article

AN - SCOPUS:85196537988

SN - 0143-1161

VL - 45

SP - 4344

EP - 4366

JO - International Journal of Remote Sensing

JF - International Journal of Remote Sensing

IS - 13

ER -

Multi-view contextual adaptation network for weakly supervised object detection in remote sensing images

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this