TY - JOUR
T1 - Aggregated pyramid gating network for human pose estimation without pre-training
AU - Jiang, Chenru
AU - Huang, Kaizhu
AU - Zhang, Shufei
AU - Wang, Xinheng
AU - Xiao, Jimin
AU - Goulermas, Yannis
N1 - Publisher Copyright:
© 2023 Elsevier Ltd
PY - 2023/6
Y1 - 2023/6
N2 - In this work, we propose a comprehensive aggregated residual gating structure, the Pyramid GAting Network (PGA-Net) for human pose estimation which can select, distill, and fuse semantic level and natural level information from multiple scales. In comparison, through utilizing multi-scale features, most existing state-of-the-art pose estimation methods are still limited in three aspects. First, multi-scale features contain massively redundant information, which is unfortunately not distilled by most existing approaches. Second, preferring deeper network structures to extract strong semantic features, the conventional methods often ignore original texture information fusion. Third, to attain a good parameter initialization, the current methods heavily rely on pre-training, which is very time-consuming or even unavailable. While better coping with the above problems, our proposed PGA-Net distills high-level semantic features and replenishes low-level original information to reinforce module representation capability. Meanwhile, PGA-Net demonstrates notable training stability and superior performance even without pre-training. Extensive experiments demonstrate that our method consistently outperforms previous approaches even without pre-training, enabling thus an end-to-end model training from scratch. In COCO benchmark, PGA-Net consistently achieves over 3% improvements than the baseline (without pre-training) under various model configurations.1
AB - In this work, we propose a comprehensive aggregated residual gating structure, the Pyramid GAting Network (PGA-Net) for human pose estimation which can select, distill, and fuse semantic level and natural level information from multiple scales. In comparison, through utilizing multi-scale features, most existing state-of-the-art pose estimation methods are still limited in three aspects. First, multi-scale features contain massively redundant information, which is unfortunately not distilled by most existing approaches. Second, preferring deeper network structures to extract strong semantic features, the conventional methods often ignore original texture information fusion. Third, to attain a good parameter initialization, the current methods heavily rely on pre-training, which is very time-consuming or even unavailable. While better coping with the above problems, our proposed PGA-Net distills high-level semantic features and replenishes low-level original information to reinforce module representation capability. Meanwhile, PGA-Net demonstrates notable training stability and superior performance even without pre-training. Extensive experiments demonstrate that our method consistently outperforms previous approaches even without pre-training, enabling thus an end-to-end model training from scratch. In COCO benchmark, PGA-Net consistently achieves over 3% improvements than the baseline (without pre-training) under various model configurations.1
KW - Human pose estimation
KW - Pyramid gating system
KW - Stabilization
UR - http://www.scopus.com/inward/record.url?scp=85147994348&partnerID=8YFLogxK
U2 - 10.1016/j.patcog.2023.109429
DO - 10.1016/j.patcog.2023.109429
M3 - Article
AN - SCOPUS:85147994348
SN - 0031-3203
VL - 138
JO - Pattern Recognition
JF - Pattern Recognition
M1 - 109429
ER -