Singing voice detection using multi-feature deep fusion with CNN

Xulong Zhang, Shengchen Li, Zijin Li, Shizhe Chen, Yongwei Gao, Wei Li*

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

6 Citations (Scopus)

Abstract

The problem of singing voice detection is to segment a song into vocal and non-vocal parts. Commonly used methods usually train a model on a set of frame-based features and then predict the unknown frames by the model. However, the multi-dimensional features are usually concatenated together for each frame, with little consideration of spatial information. Hence, a deep fusion method of the Multi-feature dimensions with Convolution Neural Networks (CNN) is proposed. A one dimension convolution is made on feature dimensions for each frames, then the high-level features obtained can be used for a direct binary classification. The performance of the proposed method is on par with the state-of-art methods on public dataset.

Original languageEnglish
Title of host publicationProceedings of the 7th Conference on Sound and Music Technology CSMT 2019, Revised Selected Papers
EditorsHaifeng Li, Lin Ma, Shengchen Li, Chunying Fang, Yidan Zhu
PublisherSpringer
Pages41-52
Number of pages12
ISBN (Print)9789811527555
DOIs
Publication statusPublished - 2020
Externally publishedYes
Event7th Conference on Sound and Music Technology, CSMT 2019 - Harbin, China
Duration: 26 Dec 201929 Dec 2019

Publication series

NameLecture Notes in Electrical Engineering
Volume635
ISSN (Print)1876-1100
ISSN (Electronic)1876-1119

Conference

Conference7th Conference on Sound and Music Technology, CSMT 2019
Country/TerritoryChina
CityHarbin
Period26/12/1929/12/19

Keywords

  • Convolution neural network (CNN)
  • Deep learning
  • Multi-feature fusion
  • Singing voice detection (SVD)

Fingerprint

Dive into the research topics of 'Singing voice detection using multi-feature deep fusion with CNN'. Together they form a unique fingerprint.

Cite this