Audio Tagging With Connectionist Temporal Classification Model Using Sequentially Labelled Data

Yuanbo Hou; Qiuqiang Kong; Shengchen Li

doi:10.1007/978-981-13-6504-1_114

Audio Tagging With Connectionist Temporal Classification Model Using Sequentially Labelled Data

Yuanbo Hou^*, Qiuqiang Kong, Shengchen Li

^*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

4 Citations (Scopus)

Abstract

Audio tagging aims to predict one or several labels in an audio clip. Many previous works use weakly labelled data (WLD) for audio tagging, where only presence or absence of sound events is known, but the order of sound events is unknown. To use the order information of sound events, we propose sequentially labelled data (SLD), where both the presence or absence and the order information of sound events are known. To utilize SLD in audio tagging, we propose a convolutional recurrent neural network followed by a connectionist temporal classification (CRNN-CTC) objective function to map from an audio clip spectrogram to SLD. Experiments show that CRNN-CTC obtains an area under curve (AUC) score of 0.986 in audio tagging, outperforming the baseline CRNN of 0.908 and 0.815 with max pooling and average pooling, respectively. In addition, we show CRNN-CTC has the ability to predict the order of sound events in an audio clip.

Original language	English
Title of host publication	Communications, Signal Processing, and Systems - Proceedings of the 2018 CSPS Volume II
Subtitle of host publication	Signal Processing
Editors	Qilian Liang, Xin Liu, Zhenyu Na, Wei Wang, Jiasong Mu, Baoju Zhang
Publisher	Springer Verlag
Pages	955-964
Number of pages	10
ISBN (Print)	9789811365034
DOIs	https://doi.org/10.1007/978-981-13-6504-1_114
Publication status	Published - 2020
Externally published	Yes
Event	International Conference on Communications, Signal Processing, and Systems, CSPS 2018 - Dalian, China Duration: 14 Jul 2018 → 16 Jul 2018

Publication series

Name	Lecture Notes in Electrical Engineering
Volume	516
ISSN (Print)	1876-1100
ISSN (Electronic)	1876-1119

Conference

Conference	International Conference on Communications, Signal Processing, and Systems, CSPS 2018
Country/Territory	China
City	Dalian
Period	14/07/18 → 16/07/18

Keywords

Audio tagging
Connectionist temporal classification (CTC)
Convolutional recurrent neural network (CRNN)
Sequentially labelled data (SLD)

Access to Document

10.1007/978-981-13-6504-1_114

Cite this

Hou, Y., Kong, Q., & Li, S. (2020). Audio Tagging With Connectionist Temporal Classification Model Using Sequentially Labelled Data. In Q. Liang, X. Liu, Z. Na, W. Wang, J. Mu, & B. Zhang (Eds.), Communications, Signal Processing, and Systems - Proceedings of the 2018 CSPS Volume II: Signal Processing (pp. 955-964). (Lecture Notes in Electrical Engineering; Vol. 516). Springer Verlag. https://doi.org/10.1007/978-981-13-6504-1_114

Hou, Yuanbo ; Kong, Qiuqiang ; Li, Shengchen. / Audio Tagging With Connectionist Temporal Classification Model Using Sequentially Labelled Data. Communications, Signal Processing, and Systems - Proceedings of the 2018 CSPS Volume II: Signal Processing. editor / Qilian Liang ; Xin Liu ; Zhenyu Na ; Wei Wang ; Jiasong Mu ; Baoju Zhang. Springer Verlag, 2020. pp. 955-964 (Lecture Notes in Electrical Engineering).

@inproceedings{50e9cb3749214070912f08a25965e5b5,

title = "Audio Tagging With Connectionist Temporal Classification Model Using Sequentially Labelled Data",

abstract = "Audio tagging aims to predict one or several labels in an audio clip. Many previous works use weakly labelled data (WLD) for audio tagging, where only presence or absence of sound events is known, but the order of sound events is unknown. To use the order information of sound events, we propose sequentially labelled data (SLD), where both the presence or absence and the order information of sound events are known. To utilize SLD in audio tagging, we propose a convolutional recurrent neural network followed by a connectionist temporal classification (CRNN-CTC) objective function to map from an audio clip spectrogram to SLD. Experiments show that CRNN-CTC obtains an area under curve (AUC) score of 0.986 in audio tagging, outperforming the baseline CRNN of 0.908 and 0.815 with max pooling and average pooling, respectively. In addition, we show CRNN-CTC has the ability to predict the order of sound events in an audio clip.",

keywords = "Audio tagging, Connectionist temporal classification (CTC), Convolutional recurrent neural network (CRNN), Sequentially labelled data (SLD)",

author = "Yuanbo Hou and Qiuqiang Kong and Shengchen Li",

note = "Publisher Copyright: {\textcopyright} 2020, Springer Nature Singapore Pte Ltd.; International Conference on Communications, Signal Processing, and Systems, CSPS 2018 ; Conference date: 14-07-2018 Through 16-07-2018",

year = "2020",

doi = "10.1007/978-981-13-6504-1_114",

language = "English",

isbn = "9789811365034",

series = "Lecture Notes in Electrical Engineering",

publisher = "Springer Verlag",

pages = "955--964",

editor = "Qilian Liang and Xin Liu and Zhenyu Na and Wei Wang and Jiasong Mu and Baoju Zhang",

booktitle = "Communications, Signal Processing, and Systems - Proceedings of the 2018 CSPS Volume II",

}

Hou, Y, Kong, Q & Li, S 2020, Audio Tagging With Connectionist Temporal Classification Model Using Sequentially Labelled Data. in Q Liang, X Liu, Z Na, W Wang, J Mu & B Zhang (eds), Communications, Signal Processing, and Systems - Proceedings of the 2018 CSPS Volume II: Signal Processing. Lecture Notes in Electrical Engineering, vol. 516, Springer Verlag, pp. 955-964, International Conference on Communications, Signal Processing, and Systems, CSPS 2018, Dalian, China, 14/07/18. https://doi.org/10.1007/978-981-13-6504-1_114

Audio Tagging With Connectionist Temporal Classification Model Using Sequentially Labelled Data. / Hou, Yuanbo; Kong, Qiuqiang; Li, Shengchen.
Communications, Signal Processing, and Systems - Proceedings of the 2018 CSPS Volume II: Signal Processing. ed. / Qilian Liang; Xin Liu; Zhenyu Na; Wei Wang; Jiasong Mu; Baoju Zhang. Springer Verlag, 2020. p. 955-964 (Lecture Notes in Electrical Engineering; Vol. 516).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Audio Tagging With Connectionist Temporal Classification Model Using Sequentially Labelled Data

AU - Hou, Yuanbo

AU - Kong, Qiuqiang

AU - Li, Shengchen

PY - 2020

Y1 - 2020

N2 - Audio tagging aims to predict one or several labels in an audio clip. Many previous works use weakly labelled data (WLD) for audio tagging, where only presence or absence of sound events is known, but the order of sound events is unknown. To use the order information of sound events, we propose sequentially labelled data (SLD), where both the presence or absence and the order information of sound events are known. To utilize SLD in audio tagging, we propose a convolutional recurrent neural network followed by a connectionist temporal classification (CRNN-CTC) objective function to map from an audio clip spectrogram to SLD. Experiments show that CRNN-CTC obtains an area under curve (AUC) score of 0.986 in audio tagging, outperforming the baseline CRNN of 0.908 and 0.815 with max pooling and average pooling, respectively. In addition, we show CRNN-CTC has the ability to predict the order of sound events in an audio clip.

AB - Audio tagging aims to predict one or several labels in an audio clip. Many previous works use weakly labelled data (WLD) for audio tagging, where only presence or absence of sound events is known, but the order of sound events is unknown. To use the order information of sound events, we propose sequentially labelled data (SLD), where both the presence or absence and the order information of sound events are known. To utilize SLD in audio tagging, we propose a convolutional recurrent neural network followed by a connectionist temporal classification (CRNN-CTC) objective function to map from an audio clip spectrogram to SLD. Experiments show that CRNN-CTC obtains an area under curve (AUC) score of 0.986 in audio tagging, outperforming the baseline CRNN of 0.908 and 0.815 with max pooling and average pooling, respectively. In addition, we show CRNN-CTC has the ability to predict the order of sound events in an audio clip.

KW - Audio tagging

KW - Connectionist temporal classification (CTC)

KW - Convolutional recurrent neural network (CRNN)

KW - Sequentially labelled data (SLD)

UR - http://www.scopus.com/inward/record.url?scp=85071497682&partnerID=8YFLogxK

U2 - 10.1007/978-981-13-6504-1_114

DO - 10.1007/978-981-13-6504-1_114

M3 - Conference Proceeding

AN - SCOPUS:85071497682

SN - 9789811365034

T3 - Lecture Notes in Electrical Engineering

SP - 955

EP - 964

BT - Communications, Signal Processing, and Systems - Proceedings of the 2018 CSPS Volume II

A2 - Liang, Qilian

A2 - Liu, Xin

A2 - Na, Zhenyu

A2 - Wang, Wei

A2 - Mu, Jiasong

A2 - Zhang, Baoju

PB - Springer Verlag

T2 - International Conference on Communications, Signal Processing, and Systems, CSPS 2018

Y2 - 14 July 2018 through 16 July 2018

ER -

Hou Y, Kong Q, Li S. Audio Tagging With Connectionist Temporal Classification Model Using Sequentially Labelled Data. In Liang Q, Liu X, Na Z, Wang W, Mu J, Zhang B, editors, Communications, Signal Processing, and Systems - Proceedings of the 2018 CSPS Volume II: Signal Processing. Springer Verlag. 2020. p. 955-964. (Lecture Notes in Electrical Engineering). doi: 10.1007/978-981-13-6504-1_114

Audio Tagging With Connectionist Temporal Classification Model Using Sequentially Labelled Data

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Cite this