TY - GEN
T1 - An improved two-stream 3D convolutional neural network for human action recognition
AU - Chen, Jun
AU - Xu, Yuanping
AU - Zhang, Chaolong
AU - Xu, Zhijie
AU - Meng, Xiangxiang
AU - Wang, Jie
N1 - Publisher Copyright:
© 2019 Chinese Automation and Computing Society in the UK-CACSUK.
PY - 2019/9
Y1 - 2019/9
N2 - In order to obtain global contextual information precisely from videos with heavy camera motions and scene changes, this study proposes an improved spatiotemporal two-stream neural network architecture with a novel convolutional fusion layer. The three main improvements of this study are: 1) the Resnet-101 network has been integrated into the two streams of the target network independently; 2) two kinds of feature maps (i.e., the optical flow motion and RGB-channel information) obtained by the corresponding convolution layer of two streams respectively are superimposed on each other; 3) the temporal information is combined with the spatial information by the integrated three-dimensional (3D) convolutional neural network (CNN) to extract more latent information from the videos. The proposed approach was tested by using UCF-101 and HMDB51 benchmarking datasets and the experimental results show that the proposed two-stream 3D CNN model can gain substantial improvement on the recognition rate in video-based analysis.
AB - In order to obtain global contextual information precisely from videos with heavy camera motions and scene changes, this study proposes an improved spatiotemporal two-stream neural network architecture with a novel convolutional fusion layer. The three main improvements of this study are: 1) the Resnet-101 network has been integrated into the two streams of the target network independently; 2) two kinds of feature maps (i.e., the optical flow motion and RGB-channel information) obtained by the corresponding convolution layer of two streams respectively are superimposed on each other; 3) the temporal information is combined with the spatial information by the integrated three-dimensional (3D) convolutional neural network (CNN) to extract more latent information from the videos. The proposed approach was tested by using UCF-101 and HMDB51 benchmarking datasets and the experimental results show that the proposed two-stream 3D CNN model can gain substantial improvement on the recognition rate in video-based analysis.
KW - Human Action Recognition
KW - Optical Flow
KW - Three-dimensional CNN
KW - Two-stream CNN
UR - http://www.scopus.com/inward/record.url?scp=85075794417&partnerID=8YFLogxK
U2 - 10.23919/IConAC.2019.8894962
DO - 10.23919/IConAC.2019.8894962
M3 - Conference Proceeding
AN - SCOPUS:85075794417
T3 - ICAC 2019 - 2019 25th IEEE International Conference on Automation and Computing
BT - ICAC 2019 - 2019 25th IEEE International Conference on Automation and Computing
A2 - Yu, Hui
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 25th IEEE International Conference on Automation and Computing, ICAC 2019
Y2 - 5 September 2019 through 7 September 2019
ER -