Aggregated pyramid gating network for human pose estimation without pre-training

Chenru Jiang; Kaizhu Huang; Shufei Zhang; Xinheng Wang; Jimin Xiao; Yannis Goulermas

doi:10.1016/j.patcog.2023.109429

Aggregated pyramid gating network for human pose estimation without pre-training

Chenru Jiang, Kaizhu Huang^*, Shufei Zhang, Xinheng Wang, Jimin Xiao, Yannis Goulermas

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

8 Citations (Scopus)

Abstract

In this work, we propose a comprehensive aggregated residual gating structure, the Pyramid GAting Network (PGA-Net) for human pose estimation which can select, distill, and fuse semantic level and natural level information from multiple scales. In comparison, through utilizing multi-scale features, most existing state-of-the-art pose estimation methods are still limited in three aspects. First, multi-scale features contain massively redundant information, which is unfortunately not distilled by most existing approaches. Second, preferring deeper network structures to extract strong semantic features, the conventional methods often ignore original texture information fusion. Third, to attain a good parameter initialization, the current methods heavily rely on pre-training, which is very time-consuming or even unavailable. While better coping with the above problems, our proposed PGA-Net distills high-level semantic features and replenishes low-level original information to reinforce module representation capability. Meanwhile, PGA-Net demonstrates notable training stability and superior performance even without pre-training. Extensive experiments demonstrate that our method consistently outperforms previous approaches even without pre-training, enabling thus an end-to-end model training from scratch. In COCO benchmark, PGA-Net consistently achieves over 3% improvements than the baseline (without pre-training) under various model configurations.¹

Original language	English
Article number	109429
Journal	Pattern Recognition
Volume	138
DOIs	https://doi.org/10.1016/j.patcog.2023.109429
Publication status	Published - Jun 2023

Keywords

Human pose estimation
Pyramid gating system
Stabilization

Access to Document

10.1016/j.patcog.2023.109429

Cite this

@article{4955883db76b41f39e75875f2a075e0a,

title = "Aggregated pyramid gating network for human pose estimation without pre-training",

abstract = "In this work, we propose a comprehensive aggregated residual gating structure, the Pyramid GAting Network (PGA-Net) for human pose estimation which can select, distill, and fuse semantic level and natural level information from multiple scales. In comparison, through utilizing multi-scale features, most existing state-of-the-art pose estimation methods are still limited in three aspects. First, multi-scale features contain massively redundant information, which is unfortunately not distilled by most existing approaches. Second, preferring deeper network structures to extract strong semantic features, the conventional methods often ignore original texture information fusion. Third, to attain a good parameter initialization, the current methods heavily rely on pre-training, which is very time-consuming or even unavailable. While better coping with the above problems, our proposed PGA-Net distills high-level semantic features and replenishes low-level original information to reinforce module representation capability. Meanwhile, PGA-Net demonstrates notable training stability and superior performance even without pre-training. Extensive experiments demonstrate that our method consistently outperforms previous approaches even without pre-training, enabling thus an end-to-end model training from scratch. In COCO benchmark, PGA-Net consistently achieves over 3% improvements than the baseline (without pre-training) under various model configurations.1",

keywords = "Human pose estimation, Pyramid gating system, Stabilization",

author = "Chenru Jiang and Kaizhu Huang and Shufei Zhang and Xinheng Wang and Jimin Xiao and Yannis Goulermas",

note = "Publisher Copyright: {\textcopyright} 2023 Elsevier Ltd",

year = "2023",

month = jun,

doi = "10.1016/j.patcog.2023.109429",

language = "English",

volume = "138",

journal = "Pattern Recognition",

issn = "0031-3203",

}

TY - JOUR

T1 - Aggregated pyramid gating network for human pose estimation without pre-training

AU - Jiang, Chenru

AU - Huang, Kaizhu

AU - Zhang, Shufei

AU - Wang, Xinheng

AU - Xiao, Jimin

AU - Goulermas, Yannis

PY - 2023/6

Y1 - 2023/6

N2 - In this work, we propose a comprehensive aggregated residual gating structure, the Pyramid GAting Network (PGA-Net) for human pose estimation which can select, distill, and fuse semantic level and natural level information from multiple scales. In comparison, through utilizing multi-scale features, most existing state-of-the-art pose estimation methods are still limited in three aspects. First, multi-scale features contain massively redundant information, which is unfortunately not distilled by most existing approaches. Second, preferring deeper network structures to extract strong semantic features, the conventional methods often ignore original texture information fusion. Third, to attain a good parameter initialization, the current methods heavily rely on pre-training, which is very time-consuming or even unavailable. While better coping with the above problems, our proposed PGA-Net distills high-level semantic features and replenishes low-level original information to reinforce module representation capability. Meanwhile, PGA-Net demonstrates notable training stability and superior performance even without pre-training. Extensive experiments demonstrate that our method consistently outperforms previous approaches even without pre-training, enabling thus an end-to-end model training from scratch. In COCO benchmark, PGA-Net consistently achieves over 3% improvements than the baseline (without pre-training) under various model configurations.1

AB - In this work, we propose a comprehensive aggregated residual gating structure, the Pyramid GAting Network (PGA-Net) for human pose estimation which can select, distill, and fuse semantic level and natural level information from multiple scales. In comparison, through utilizing multi-scale features, most existing state-of-the-art pose estimation methods are still limited in three aspects. First, multi-scale features contain massively redundant information, which is unfortunately not distilled by most existing approaches. Second, preferring deeper network structures to extract strong semantic features, the conventional methods often ignore original texture information fusion. Third, to attain a good parameter initialization, the current methods heavily rely on pre-training, which is very time-consuming or even unavailable. While better coping with the above problems, our proposed PGA-Net distills high-level semantic features and replenishes low-level original information to reinforce module representation capability. Meanwhile, PGA-Net demonstrates notable training stability and superior performance even without pre-training. Extensive experiments demonstrate that our method consistently outperforms previous approaches even without pre-training, enabling thus an end-to-end model training from scratch. In COCO benchmark, PGA-Net consistently achieves over 3% improvements than the baseline (without pre-training) under various model configurations.1

KW - Human pose estimation

KW - Pyramid gating system

KW - Stabilization

UR - http://www.scopus.com/inward/record.url?scp=85147994348&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2023.109429

DO - 10.1016/j.patcog.2023.109429

M3 - Article

AN - SCOPUS:85147994348

SN - 0031-3203

VL - 138

JO - Pattern Recognition

JF - Pattern Recognition

M1 - 109429

ER -

Aggregated pyramid gating network for human pose estimation without pre-training

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this