TY - GEN
T1 - Audio Tagging With Connectionist Temporal Classification Model Using Sequentially Labelled Data
AU - Hou, Yuanbo
AU - Kong, Qiuqiang
AU - Li, Shengchen
N1 - Publisher Copyright:
© 2020, Springer Nature Singapore Pte Ltd.
PY - 2020
Y1 - 2020
N2 - Audio tagging aims to predict one or several labels in an audio clip. Many previous works use weakly labelled data (WLD) for audio tagging, where only presence or absence of sound events is known, but the order of sound events is unknown. To use the order information of sound events, we propose sequentially labelled data (SLD), where both the presence or absence and the order information of sound events are known. To utilize SLD in audio tagging, we propose a convolutional recurrent neural network followed by a connectionist temporal classification (CRNN-CTC) objective function to map from an audio clip spectrogram to SLD. Experiments show that CRNN-CTC obtains an area under curve (AUC) score of 0.986 in audio tagging, outperforming the baseline CRNN of 0.908 and 0.815 with max pooling and average pooling, respectively. In addition, we show CRNN-CTC has the ability to predict the order of sound events in an audio clip.
AB - Audio tagging aims to predict one or several labels in an audio clip. Many previous works use weakly labelled data (WLD) for audio tagging, where only presence or absence of sound events is known, but the order of sound events is unknown. To use the order information of sound events, we propose sequentially labelled data (SLD), where both the presence or absence and the order information of sound events are known. To utilize SLD in audio tagging, we propose a convolutional recurrent neural network followed by a connectionist temporal classification (CRNN-CTC) objective function to map from an audio clip spectrogram to SLD. Experiments show that CRNN-CTC obtains an area under curve (AUC) score of 0.986 in audio tagging, outperforming the baseline CRNN of 0.908 and 0.815 with max pooling and average pooling, respectively. In addition, we show CRNN-CTC has the ability to predict the order of sound events in an audio clip.
KW - Audio tagging
KW - Connectionist temporal classification (CTC)
KW - Convolutional recurrent neural network (CRNN)
KW - Sequentially labelled data (SLD)
UR - http://www.scopus.com/inward/record.url?scp=85071497682&partnerID=8YFLogxK
U2 - 10.1007/978-981-13-6504-1_114
DO - 10.1007/978-981-13-6504-1_114
M3 - Conference Proceeding
AN - SCOPUS:85071497682
SN - 9789811365034
T3 - Lecture Notes in Electrical Engineering
SP - 955
EP - 964
BT - Communications, Signal Processing, and Systems - Proceedings of the 2018 CSPS Volume II
A2 - Liang, Qilian
A2 - Liu, Xin
A2 - Na, Zhenyu
A2 - Wang, Wei
A2 - Mu, Jiasong
A2 - Zhang, Baoju
PB - Springer Verlag
T2 - International Conference on Communications, Signal Processing, and Systems, CSPS 2018
Y2 - 14 July 2018 through 16 July 2018
ER -