TY - JOUR
T1 - Multibranch Attention Networks for Action Recognition in Still Images
AU - Yan, Shiyang
AU - Smith, Jeremy S.
AU - Lu, Wenjin
AU - Zhang, Bailing
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2018/12
Y1 - 2018/12
N2 - Contextual information plays an important role in visual recognition. This is especially true for action recognition as contextual information, such as the objects a person interacts with and the scene in which the action is performed, is inseparable from a predefined action class. Meanwhile, the attention mechanism of humans shows remarkable capability compared with the existing computer vision system in discovering contextual information. Inspired by this, we applied the soft attention mechanism by adding two extra branches in the original VGG16 model in which one is to apply scene-level attention whilst the other is region-level attention to capture the global and local contextual information. To make the multibranch model well converged and fully optimized, a two-step training method is proposed with an alternating optimization strategy. We call this model multibranch attention networks. To validate the effectiveness of the proposed approach on two experimental settings: with and without the bounding box of the target person, three publicly available datasets on human action were used for evaluation. This method achieved state-of-the-art results on the PASCAL VOC action dataset and the Stanford 40 dataset on both experimental settings and performed well on humans interacting with common objects dataset.
AB - Contextual information plays an important role in visual recognition. This is especially true for action recognition as contextual information, such as the objects a person interacts with and the scene in which the action is performed, is inseparable from a predefined action class. Meanwhile, the attention mechanism of humans shows remarkable capability compared with the existing computer vision system in discovering contextual information. Inspired by this, we applied the soft attention mechanism by adding two extra branches in the original VGG16 model in which one is to apply scene-level attention whilst the other is region-level attention to capture the global and local contextual information. To make the multibranch model well converged and fully optimized, a two-step training method is proposed with an alternating optimization strategy. We call this model multibranch attention networks. To validate the effectiveness of the proposed approach on two experimental settings: with and without the bounding box of the target person, three publicly available datasets on human action were used for evaluation. This method achieved state-of-the-art results on the PASCAL VOC action dataset and the Stanford 40 dataset on both experimental settings and performed well on humans interacting with common objects dataset.
KW - Action recognition
KW - contextual information
KW - multibranch CNN
KW - soft attention mechanism
UR - http://www.scopus.com/inward/record.url?scp=85038827751&partnerID=8YFLogxK
U2 - 10.1109/TCDS.2017.2783944
DO - 10.1109/TCDS.2017.2783944
M3 - Article
AN - SCOPUS:85038827751
SN - 2379-8920
VL - 10
SP - 1116
EP - 1125
JO - IEEE Transactions on Cognitive and Developmental Systems
JF - IEEE Transactions on Cognitive and Developmental Systems
IS - 4
M1 - 8214269
ER -