Attention based convolutional recurrent neural network for environmental sound classification

Zhichao Zhang, Shugong Xu*, Tianhao Qiao, Shunqing Zhang, Shan Cao

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

15 Citations (Scopus)

Abstract

Environmental sound classification (ESC) is a challenging problem due to the complexity of sounds. The ESC performance is heavily dependent on the effectiveness of representative features extracted from the environmental sounds. However, ESC often suffers from the semantically irrelevant frames and silent frames. In order to deal with this, we employ a frame-level attention model to focus on the semantically relevant frames and salient frames. Specifically, we first propose an convolutional recurrent neural network to learn spectro-temporal features and temporal correlations. Then, we extend our convolutional RNN model with a frame-level attention mechanism to learn discriminative feature representations for ESC. Experiments were conducted on ESC-50 and ESC-10 datasets. Experimental results demonstrated the effectiveness of the proposed method and achieved the state-of-the-art performance in terms of classification accuracy.

Original languageEnglish
Title of host publicationPattern Recognition and Computer Vision- 2nd Chinese Conference, PRCV 2019, Proceedings, Part I
EditorsZhouchen Lin, Liang Wang, Tieniu Tan, Jian Yang, Guangming Shi, Nanning Zheng, Xilin Chen, Yanning Zhang
PublisherSpringer
Pages261-271
Number of pages11
ISBN (Print)9783030316532
DOIs
Publication statusPublished - 2019
Externally publishedYes
Event2nd Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2019 - Xi'an, China
Duration: 8 Nov 201911 Nov 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11857 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2nd Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2019
Country/TerritoryChina
CityXi'an
Period8/11/1911/11/19

Keywords

  • Attention mechanism
  • Convolutional recurrent neural network
  • Environmental sound classification

Fingerprint

Dive into the research topics of 'Attention based convolutional recurrent neural network for environmental sound classification'. Together they form a unique fingerprint.

Cite this