Transfer learning for improving singing-voice detection in polyphonic instrumental music

Yuanbo Hou, Frank K. Soong, Jian Luan, Shengchen Li

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

8 Citations (Scopus)

Abstract

Detecting singing-voice in polyphonic instrumental music is critical to music information retrieval. To train a robust vocal detector, a large dataset marked with vocal or non-vocal label at frame-level is essential. However, frame-level labeling is time-consuming and labor expensive, resulting there is little well-labeled dataset available for singing-voice detection (S-VD). Hence, we propose a data augmentation method for SVD by transfer learning. In this study, clean speech clips with voice activity endpoints and separate instrumental music clips are artificially added together to simulate polyphonic vocals to train a vocal/non-vocal detector. Due to the different articulation and phonation between speaking and singing, the vocal detector trained with the artificial dataset does not match well with the polyphonic music which is singing vocals together with the instrumental accompaniments. To reduce this mismatch, transfer learning is used to transfer the knowledge learned from the artificial speech-plus-music training set to a small but matched polyphonic dataset, i.e., singing vocals with accompaniments. By transferring the related knowledge to make up for the lack of well-labeled training data in S-VD, the proposed data augmentation method by transfer learning can improve S-VD performance with an F-score improvement from 89.5% to 93.2%.

Original languageEnglish
Title of host publicationInterspeech 2020
PublisherInternational Speech Communication Association
Pages1236-1240
Number of pages5
ISBN (Print)9781713820697
DOIs
Publication statusPublished - 2020
Externally publishedYes
Event21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China
Duration: 25 Oct 202029 Oct 2020

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2020-October
ISSN (Print)2308-457X
ISSN (Electronic)1990-9772

Conference

Conference21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
Country/TerritoryChina
CityShanghai
Period25/10/2029/10/20

Keywords

  • Data augmentation
  • Music information retrieval
  • Singing-voice detection
  • Transfer learning

Fingerprint

Dive into the research topics of 'Transfer learning for improving singing-voice detection in polyphonic instrumental music'. Together they form a unique fingerprint.

Cite this