TY - GEN
T1 - Attention based convolutional recurrent neural network for environmental sound classification
AU - Zhang, Zhichao
AU - Xu, Shugong
AU - Qiao, Tianhao
AU - Zhang, Shunqing
AU - Cao, Shan
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2019.
PY - 2019
Y1 - 2019
N2 - Environmental sound classification (ESC) is a challenging problem due to the complexity of sounds. The ESC performance is heavily dependent on the effectiveness of representative features extracted from the environmental sounds. However, ESC often suffers from the semantically irrelevant frames and silent frames. In order to deal with this, we employ a frame-level attention model to focus on the semantically relevant frames and salient frames. Specifically, we first propose an convolutional recurrent neural network to learn spectro-temporal features and temporal correlations. Then, we extend our convolutional RNN model with a frame-level attention mechanism to learn discriminative feature representations for ESC. Experiments were conducted on ESC-50 and ESC-10 datasets. Experimental results demonstrated the effectiveness of the proposed method and achieved the state-of-the-art performance in terms of classification accuracy.
AB - Environmental sound classification (ESC) is a challenging problem due to the complexity of sounds. The ESC performance is heavily dependent on the effectiveness of representative features extracted from the environmental sounds. However, ESC often suffers from the semantically irrelevant frames and silent frames. In order to deal with this, we employ a frame-level attention model to focus on the semantically relevant frames and salient frames. Specifically, we first propose an convolutional recurrent neural network to learn spectro-temporal features and temporal correlations. Then, we extend our convolutional RNN model with a frame-level attention mechanism to learn discriminative feature representations for ESC. Experiments were conducted on ESC-50 and ESC-10 datasets. Experimental results demonstrated the effectiveness of the proposed method and achieved the state-of-the-art performance in terms of classification accuracy.
KW - Attention mechanism
KW - Convolutional recurrent neural network
KW - Environmental sound classification
UR - http://www.scopus.com/inward/record.url?scp=85086141906&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-31654-9_23
DO - 10.1007/978-3-030-31654-9_23
M3 - Conference Proceeding
AN - SCOPUS:85086141906
SN - 9783030316532
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 261
EP - 271
BT - Pattern Recognition and Computer Vision- 2nd Chinese Conference, PRCV 2019, Proceedings, Part I
A2 - Lin, Zhouchen
A2 - Wang, Liang
A2 - Tan, Tieniu
A2 - Yang, Jian
A2 - Shi, Guangming
A2 - Zheng, Nanning
A2 - Chen, Xilin
A2 - Zhang, Yanning
PB - Springer
T2 - 2nd Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2019
Y2 - 8 November 2019 through 11 November 2019
ER -