TY - JOUR
T1 - Bidirectional Long Short-Term Memory with Temporal Dense Sampling for human action recognition
AU - Tan, Kok Seang
AU - Lim, Kian Ming
AU - Lee, Chin Poo
AU - Kwek, Lee Chung
N1 - Publisher Copyright:
© 2022 Elsevier Ltd
PY - 2022/12/30
Y1 - 2022/12/30
N2 - Long Short-Term Memory networks are making significant inroads into improving time series applications, including human action recognition. In a human action video, the spatial and temporal streams carry distinctive yet prominent information, hence many researchers turn to spatio-temporal models for human action recognition. A spatio-temporal model integrates the temporal network (e.g. Long Short-Term Memory) and spatial network (e.g. Convolutional Neural Networks). There are few challenges in the existing human action recognition: (1) the uni-directional modeling of Long Short-Term Memory making it unable to preserve the information from the future, (2) the sparse sampling strategy tends to lose prominent information when performing dimension reduction on the input of Long Short-Term Memory, and (3) the fusion strategy for consolidating the temporal network and spatial network. In view of this, we propose a Bidirectional Long Short-Term Memory with Temporal Dense Sampling and Fusion Network method to address the above-mentioned challenges. The Temporal Dense Sampling partitions the human action video into segments and then performs maxpooling operation along the temporal axis in each segment. A multi-stream bidirectional Long Short-Term Memory network is adopted to encode the long-term spatial and temporal dependencies in both forward and backward directions. Instead of assigning fixed weights to the spatial network and temporal network, we propose a fusion network where a fully-connected layer is trained to adaptively assign the weights for the networks. The empirical results demonstrate that the proposed Bidirectional Long Short-Term Memory with Temporal Dense Sampling and Fusion Network method outshines the state-of-the-art methods with an accuracy of 94.78% on UCF101 dataset and 70.72% on HMDB51 dataset.
AB - Long Short-Term Memory networks are making significant inroads into improving time series applications, including human action recognition. In a human action video, the spatial and temporal streams carry distinctive yet prominent information, hence many researchers turn to spatio-temporal models for human action recognition. A spatio-temporal model integrates the temporal network (e.g. Long Short-Term Memory) and spatial network (e.g. Convolutional Neural Networks). There are few challenges in the existing human action recognition: (1) the uni-directional modeling of Long Short-Term Memory making it unable to preserve the information from the future, (2) the sparse sampling strategy tends to lose prominent information when performing dimension reduction on the input of Long Short-Term Memory, and (3) the fusion strategy for consolidating the temporal network and spatial network. In view of this, we propose a Bidirectional Long Short-Term Memory with Temporal Dense Sampling and Fusion Network method to address the above-mentioned challenges. The Temporal Dense Sampling partitions the human action video into segments and then performs maxpooling operation along the temporal axis in each segment. A multi-stream bidirectional Long Short-Term Memory network is adopted to encode the long-term spatial and temporal dependencies in both forward and backward directions. Instead of assigning fixed weights to the spatial network and temporal network, we propose a fusion network where a fully-connected layer is trained to adaptively assign the weights for the networks. The empirical results demonstrate that the proposed Bidirectional Long Short-Term Memory with Temporal Dense Sampling and Fusion Network method outshines the state-of-the-art methods with an accuracy of 94.78% on UCF101 dataset and 70.72% on HMDB51 dataset.
KW - Bidirectional LSTM
KW - Human action recognition
KW - Temporal Dense Sampling
UR - http://www.scopus.com/inward/record.url?scp=85136037313&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2022.118484
DO - 10.1016/j.eswa.2022.118484
M3 - Article
AN - SCOPUS:85136037313
SN - 0957-4174
VL - 210
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 118484
ER -