Audio Tagging With Connectionist Temporal Classification Model Using Sequentially Labelled Data

Yuanbo Hou*, Qiuqiang Kong, Shengchen Li

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

2 Citations (Scopus)

Abstract

Audio tagging aims to predict one or several labels in an audio clip. Many previous works use weakly labelled data (WLD) for audio tagging, where only presence or absence of sound events is known, but the order of sound events is unknown. To use the order information of sound events, we propose sequentially labelled data (SLD), where both the presence or absence and the order information of sound events are known. To utilize SLD in audio tagging, we propose a convolutional recurrent neural network followed by a connectionist temporal classification (CRNN-CTC) objective function to map from an audio clip spectrogram to SLD. Experiments show that CRNN-CTC obtains an area under curve (AUC) score of 0.986 in audio tagging, outperforming the baseline CRNN of 0.908 and 0.815 with max pooling and average pooling, respectively. In addition, we show CRNN-CTC has the ability to predict the order of sound events in an audio clip.

Original languageEnglish
Title of host publicationCommunications, Signal Processing, and Systems - Proceedings of the 2018 CSPS Volume II
Subtitle of host publicationSignal Processing
EditorsQilian Liang, Xin Liu, Zhenyu Na, Wei Wang, Jiasong Mu, Baoju Zhang
PublisherSpringer Verlag
Pages955-964
Number of pages10
ISBN (Print)9789811365034
DOIs
Publication statusPublished - 2020
Externally publishedYes
EventInternational Conference on Communications, Signal Processing, and Systems, CSPS 2018 - Dalian, China
Duration: 14 Jul 201816 Jul 2018

Publication series

NameLecture Notes in Electrical Engineering
Volume516
ISSN (Print)1876-1100
ISSN (Electronic)1876-1119

Conference

ConferenceInternational Conference on Communications, Signal Processing, and Systems, CSPS 2018
Country/TerritoryChina
CityDalian
Period14/07/1816/07/18

Keywords

  • Audio tagging
  • Connectionist temporal classification (CTC)
  • Convolutional recurrent neural network (CRNN)
  • Sequentially labelled data (SLD)

Cite this