TY - GEN
T1 - Pay Attention Selectively and Comprehensively
T2 - 28th ACM International Conference on Multimedia, MM 2020
AU - Jiang, Chenru
AU - Huang, Kaizhu
AU - Zhang, Shufei
AU - Wang, Xinheng
AU - Xiao, Jimin
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/10/12
Y1 - 2020/10/12
N2 - Deep neural network with multi-scale feature fusion has achieved great success in human pose estimation. However, drawbacks still exist in these methods: 1) they consider multi-scale features equally, which may over-emphasize redundant features; 2) preferring deeper structures, they can learn features with the strong semantic representation, but tend to lose natural discriminative information; 3) to attain good performance, they rely heavily on pretraining, which is time-consuming, or even unavailable practically. To mitigate these problems, we propose a novel comprehensive recalibration model called Pyramid GAting Network (PGA-Net) that is capable of distillating, selecting, and fusing the discriminative and attention-aware features at different scales and different levels (i.e., both semantic and natural levels). Meanwhile, focusing on fusing features both selectively and comprehensively, PGA-Net can demonstrate remarkable stability and encouraging performance even without pre-training, making the model can be trained truly from scratch. We demonstrate the effectiveness of PGA-Net through validating on COCO and MPII benchmarks, attaining new state-of-the-art performance. https://github.com/ssr0512/PGA-Net
AB - Deep neural network with multi-scale feature fusion has achieved great success in human pose estimation. However, drawbacks still exist in these methods: 1) they consider multi-scale features equally, which may over-emphasize redundant features; 2) preferring deeper structures, they can learn features with the strong semantic representation, but tend to lose natural discriminative information; 3) to attain good performance, they rely heavily on pretraining, which is time-consuming, or even unavailable practically. To mitigate these problems, we propose a novel comprehensive recalibration model called Pyramid GAting Network (PGA-Net) that is capable of distillating, selecting, and fusing the discriminative and attention-aware features at different scales and different levels (i.e., both semantic and natural levels). Meanwhile, focusing on fusing features both selectively and comprehensively, PGA-Net can demonstrate remarkable stability and encouraging performance even without pre-training, making the model can be trained truly from scratch. We demonstrate the effectiveness of PGA-Net through validating on COCO and MPII benchmarks, attaining new state-of-the-art performance. https://github.com/ssr0512/PGA-Net
KW - human pose estimation
KW - pyramid gating system
KW - stabilization
UR - http://www.scopus.com/inward/record.url?scp=85106925910&partnerID=8YFLogxK
U2 - 10.1145/3394171.3414041
DO - 10.1145/3394171.3414041
M3 - Conference Proceeding
AN - SCOPUS:85106925910
T3 - MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia
SP - 2364
EP - 2371
BT - MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia
PB - Association for Computing Machinery, Inc
Y2 - 12 October 2020 through 16 October 2020
ER -