TY - GEN
T1 - Long short-term memory recurrent neural network based segment features for music genre classification
AU - Dai, Jia
AU - Liang, Shan
AU - Xue, Wei
AU - Ni, Chongjia
AU - Liu, Wenju
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2017/5/2
Y1 - 2017/5/2
N2 - In the conventional frame feature based music genre classification methods, the audio data is represented by independent frames and the sequential nature of audio is totally ignored. If the sequential knowledge is well modeled and combined, the classification performance can be significantly improved. The long short-term memory(LSTM) recurrent neural network (RNN) which uses a set of special memory cells to model for long-range feature sequence, has been successfully used for many sequence labeling and sequence prediction tasks. In this paper, we propose the LSTM RNN based segment features for music genre classification. The LSTM RNN is used to learn the representation of LSTM frame feature. The segment features are the statistics of frame features in each segment. Furthermore, the LSTM segment feature is combined with the segment representation of initial frame feature to obtain the fusional segment feature. The evaluation on ISMIR database show that the LSTM segment feature performs better than the frame feature. Overall, the fusional segment feature achieves 89.71% classification accuracy, about 4.19% improvement over the baseline model using deep neural network (DNN). This significant improvement show the effectiveness of the proposed segment feature.
AB - In the conventional frame feature based music genre classification methods, the audio data is represented by independent frames and the sequential nature of audio is totally ignored. If the sequential knowledge is well modeled and combined, the classification performance can be significantly improved. The long short-term memory(LSTM) recurrent neural network (RNN) which uses a set of special memory cells to model for long-range feature sequence, has been successfully used for many sequence labeling and sequence prediction tasks. In this paper, we propose the LSTM RNN based segment features for music genre classification. The LSTM RNN is used to learn the representation of LSTM frame feature. The segment features are the statistics of frame features in each segment. Furthermore, the LSTM segment feature is combined with the segment representation of initial frame feature to obtain the fusional segment feature. The evaluation on ISMIR database show that the LSTM segment feature performs better than the frame feature. Overall, the fusional segment feature achieves 89.71% classification accuracy, about 4.19% improvement over the baseline model using deep neural network (DNN). This significant improvement show the effectiveness of the proposed segment feature.
KW - Long short-term memory
KW - Music genre classification
KW - Recurrent neural network
KW - Scattering transform
UR - http://www.scopus.com/inward/record.url?scp=85020195796&partnerID=8YFLogxK
U2 - 10.1109/ISCSLP.2016.7918369
DO - 10.1109/ISCSLP.2016.7918369
M3 - Conference Proceeding
AN - SCOPUS:85020195796
T3 - Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016
BT - Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016
A2 - Wang, Hsin-Min
A2 - Hou, Qingzhi
A2 - Wei, Yuan
A2 - Lee, Tan
A2 - Wei, Jianguo
A2 - Xie, Lei
A2 - Feng, Hui
A2 - Dang, Jianwu
A2 - Dang, Jianwu
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016
Y2 - 17 October 2016 through 20 October 2016
ER -