Long short-term memory recurrent neural network based segment features for music genre classification

Jia Dai, Shan Liang, Wei Xue, Chongjia Ni, Wenju Liu

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

28 Citations (Scopus)

Abstract

In the conventional frame feature based music genre classification methods, the audio data is represented by independent frames and the sequential nature of audio is totally ignored. If the sequential knowledge is well modeled and combined, the classification performance can be significantly improved. The long short-term memory(LSTM) recurrent neural network (RNN) which uses a set of special memory cells to model for long-range feature sequence, has been successfully used for many sequence labeling and sequence prediction tasks. In this paper, we propose the LSTM RNN based segment features for music genre classification. The LSTM RNN is used to learn the representation of LSTM frame feature. The segment features are the statistics of frame features in each segment. Furthermore, the LSTM segment feature is combined with the segment representation of initial frame feature to obtain the fusional segment feature. The evaluation on ISMIR database show that the LSTM segment feature performs better than the frame feature. Overall, the fusional segment feature achieves 89.71% classification accuracy, about 4.19% improvement over the baseline model using deep neural network (DNN). This significant improvement show the effectiveness of the proposed segment feature.

Original languageEnglish
Title of host publicationProceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016
EditorsHsin-Min Wang, Qingzhi Hou, Yuan Wei, Tan Lee, Jianguo Wei, Lei Xie, Hui Feng, Jianwu Dang, Jianwu Dang
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781509042937
DOIs
Publication statusPublished - 2 May 2017
Externally publishedYes
Event10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016 - Tianjin, China
Duration: 17 Oct 201620 Oct 2016

Publication series

NameProceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016

Conference

Conference10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016
Country/TerritoryChina
CityTianjin
Period17/10/1620/10/16

Keywords

  • Long short-term memory
  • Music genre classification
  • Recurrent neural network
  • Scattering transform

Fingerprint

Dive into the research topics of 'Long short-term memory recurrent neural network based segment features for music genre classification'. Together they form a unique fingerprint.

Cite this