Source Separation with Weakly Labelled Data: An Approach to Computational Auditory Scene Analysis

Qiuqiang Kong; Yuxuan Wang; Xuchen Song; Yin Cao; Wenwu Wang; Mark D. Plumbley

doi:10.1109/ICASSP40776.2020.9053396

Source Separation with Weakly Labelled Data: An Approach to Computational Auditory Scene Analysis

Qiuqiang Kong, Yuxuan Wang, Xuchen Song, Yin Cao, Wenwu Wang, Mark D. Plumbley

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

42 Citations (Scopus)

Abstract

Source separation is the task of separating an audio recording into individual sound sources. Source separation is fundamental for computational auditory scene analysis. Previous work on source separation has focused on separating particular sound classes such as speech and music. Much previous work requires mixtures and clean source pairs for training. In this work, we propose a source separation framework trained with weakly labelled data. Weakly labelled data only contains the tags of an audio clip, without the occurrence time of sound events. We first train a sound event detection system with AudioSet. The trained sound event detection system is used to detect segments that are most likely to contain a target sound event. Then a regression is learnt from a mixture of two randomly selected segments to a target segment conditioned on the audio tagging prediction of the target segment. Our proposed system can separate 527 kinds of sound classes from AudioSet within a single system. A U-Net is adopted for the separation system and achieves an average SDR of 5.67 dB over 527 sound classes in AudioSet.

Original language	English
Title of host publication	2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	101-105
Number of pages	5
ISBN (Electronic)	9781509066315
DOIs	https://doi.org/10.1109/ICASSP40776.2020.9053396
Publication status	Published - May 2020
Externally published	Yes
Event	2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Barcelona, Spain Duration: 4 May 2020 → 8 May 2020

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume	2020-May
ISSN (Print)	1520-6149

Conference

Conference	2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
Country/Territory	Spain
City	Barcelona
Period	4/05/20 → 8/05/20

Keywords

AudioSet
Source separation
computational auditory scene analysis
weakly labelled data

Access to Document

10.1109/ICASSP40776.2020.9053396

Cite this

Kong, Q., Wang, Y., Song, X., Cao, Y., Wang, W., & Plumbley, M. D. (2020). Source Separation with Weakly Labelled Data: An Approach to Computational Auditory Scene Analysis. In 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings (pp. 101-105). Article 9053396 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2020-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP40776.2020.9053396

Kong, Qiuqiang ; Wang, Yuxuan ; Song, Xuchen et al. / Source Separation with Weakly Labelled Data : An Approach to Computational Auditory Scene Analysis. 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2020. pp. 101-105 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{9a2ec884ddbb47f789b658e967644076,

title = "Source Separation with Weakly Labelled Data: An Approach to Computational Auditory Scene Analysis",

abstract = "Source separation is the task of separating an audio recording into individual sound sources. Source separation is fundamental for computational auditory scene analysis. Previous work on source separation has focused on separating particular sound classes such as speech and music. Much previous work requires mixtures and clean source pairs for training. In this work, we propose a source separation framework trained with weakly labelled data. Weakly labelled data only contains the tags of an audio clip, without the occurrence time of sound events. We first train a sound event detection system with AudioSet. The trained sound event detection system is used to detect segments that are most likely to contain a target sound event. Then a regression is learnt from a mixture of two randomly selected segments to a target segment conditioned on the audio tagging prediction of the target segment. Our proposed system can separate 527 kinds of sound classes from AudioSet within a single system. A U-Net is adopted for the separation system and achieves an average SDR of 5.67 dB over 527 sound classes in AudioSet.",

keywords = "AudioSet, Source separation, computational auditory scene analysis, weakly labelled data",

author = "Qiuqiang Kong and Yuxuan Wang and Xuchen Song and Yin Cao and Wenwu Wang and Plumbley, {Mark D.}",

note = "Publisher Copyright: {\textcopyright} 2020 IEEE.; 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 ; Conference date: 04-05-2020 Through 08-05-2020",

year = "2020",

month = may,

doi = "10.1109/ICASSP40776.2020.9053396",

language = "English",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "101--105",

booktitle = "2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings",

}

Kong, Q, Wang, Y, Song, X, Cao, Y, Wang, W & Plumbley, MD 2020, Source Separation with Weakly Labelled Data: An Approach to Computational Auditory Scene Analysis. in 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings., 9053396, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2020-May, Institute of Electrical and Electronics Engineers Inc., pp. 101-105, 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020, Barcelona, Spain, 4/05/20. https://doi.org/10.1109/ICASSP40776.2020.9053396

Source Separation with Weakly Labelled Data: An Approach to Computational Auditory Scene Analysis. / Kong, Qiuqiang; Wang, Yuxuan; Song, Xuchen et al.
2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2020. p. 101-105 9053396 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2020-May).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Source Separation with Weakly Labelled Data

T2 - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020

AU - Kong, Qiuqiang

AU - Wang, Yuxuan

AU - Song, Xuchen

AU - Cao, Yin

AU - Wang, Wenwu

AU - Plumbley, Mark D.

PY - 2020/5

Y1 - 2020/5

N2 - Source separation is the task of separating an audio recording into individual sound sources. Source separation is fundamental for computational auditory scene analysis. Previous work on source separation has focused on separating particular sound classes such as speech and music. Much previous work requires mixtures and clean source pairs for training. In this work, we propose a source separation framework trained with weakly labelled data. Weakly labelled data only contains the tags of an audio clip, without the occurrence time of sound events. We first train a sound event detection system with AudioSet. The trained sound event detection system is used to detect segments that are most likely to contain a target sound event. Then a regression is learnt from a mixture of two randomly selected segments to a target segment conditioned on the audio tagging prediction of the target segment. Our proposed system can separate 527 kinds of sound classes from AudioSet within a single system. A U-Net is adopted for the separation system and achieves an average SDR of 5.67 dB over 527 sound classes in AudioSet.

AB - Source separation is the task of separating an audio recording into individual sound sources. Source separation is fundamental for computational auditory scene analysis. Previous work on source separation has focused on separating particular sound classes such as speech and music. Much previous work requires mixtures and clean source pairs for training. In this work, we propose a source separation framework trained with weakly labelled data. Weakly labelled data only contains the tags of an audio clip, without the occurrence time of sound events. We first train a sound event detection system with AudioSet. The trained sound event detection system is used to detect segments that are most likely to contain a target sound event. Then a regression is learnt from a mixture of two randomly selected segments to a target segment conditioned on the audio tagging prediction of the target segment. Our proposed system can separate 527 kinds of sound classes from AudioSet within a single system. A U-Net is adopted for the separation system and achieves an average SDR of 5.67 dB over 527 sound classes in AudioSet.

KW - AudioSet

KW - Source separation

KW - computational auditory scene analysis

KW - weakly labelled data

UR - http://www.scopus.com/inward/record.url?scp=85089219177&partnerID=8YFLogxK

U2 - 10.1109/ICASSP40776.2020.9053396

DO - 10.1109/ICASSP40776.2020.9053396

M3 - Conference Proceeding

AN - SCOPUS:85089219177

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 101

EP - 105

BT - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 4 May 2020 through 8 May 2020

ER -

Kong Q, Wang Y, Song X, Cao Y, Wang W, Plumbley MD. Source Separation with Weakly Labelled Data: An Approach to Computational Auditory Scene Analysis. In 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2020. p. 101-105. 9053396. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP40776.2020.9053396

Source Separation with Weakly Labelled Data: An Approach to Computational Auditory Scene Analysis

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this