A comparison of attention mechanisms of convolutional neural network in weakly labeled audio tagging

Yuanbo Hou*, Qiuqiang Kong, Shengchen Li

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

1 Citation (Scopus)

Abstract

Audio tagging aims to predict the types of sound events occurring in audio clips. Recently, the convolutional recurrent neural network (CRNN) has achieved state-of-the-art performance in audio tagging. In CRNN, convolutional layers are applied on input audio features to extract high-level representations followed by recurrent layers. To better learn high-level representations of acoustic features, attention mechanisms were introduced to the convolutional layers of CRNN. Attention is a learning technique that could steer the model to information important to the task to obtain better performance. The two different attention mechanisms in the CRNN, the Squeeze-and-Excitation (SE) block and gated linear unit (GLU), are based on a gating mechanism, but their concerns are different. To compare the performance of the SE block and GLU, we propose to use a CRNN with a SE block (SE-CRNN) and a CRNN with a GLU (GLU-CRNN) in weakly labeled audio tagging and compare these results with the CRNN baseline. The experiments show that the GLU-CRNN achieves an area under curve score of 0.877 in polyphonic audio tagging, outperforming the SE-CRNN of 0.865 and the CRNN baseline of 0.838. The results show that the performance of attention based on GLU is better than the performance of attention based on the SE block in CRNN for weakly labeled polyphonic audio tagging.

Original languageEnglish
Title of host publicationProceedings of the 6th Conference on Sound and Music Technology, CSMT - Revised Selected Papers, 2018
EditorsWei Li, Shengchen Li, Xi Shao, Zijin Li
PublisherSpringer Verlag
Pages85-96
Number of pages12
ISBN (Print)9789811387067
DOIs
Publication statusPublished - 2019
Externally publishedYes
Event6th Conference on Sound and Music Technology, CSMT 2018 - Xiamen, China
Duration: 24 Nov 201826 Nov 2018

Publication series

NameLecture Notes in Electrical Engineering
Volume568
ISSN (Print)1876-1100
ISSN (Electronic)1876-1119

Conference

Conference6th Conference on Sound and Music Technology, CSMT 2018
Country/TerritoryChina
CityXiamen
Period24/11/1826/11/18

Keywords

  • Audio tagging
  • Convolutional neural network (CNN)
  • Convolutional recurrent neural network (CRNN)
  • Gated linear unit (GLU)
  • Squeeze-and-Excitation (SE) block

Cite this