Two-Stage Multi-Target Joint Learning for Monaural Speech Separation

Shuai Nie; Shan Liang; Wei Xue; Xueliang Zhang; Wenju Liu; Like Dong; Hong Yang

Two-Stage Multi-Target Joint Learning for Monaural Speech Separation

Shuai Nie, Shan Liang, Wei Xue, Xueliang Zhang, Wenju Liu, Like Dong, Hong Yang

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

7 Citations (Scopus)

Abstract

Recently, supervised speech separation has been extensively studied and shown considerable promise. Due to the temporal continuity of speech, speech auditory features and separation targets present prominent spectro-temporal structures and strong correlations over the time-frequency (T-F) domain, which can be exploited for speech separation. However, many supervised speech separation methods independently model each T-F unit with only one target and much ignore these useful information. In this paper, we propose a two-stage multi-target joint learning method to jointly model the related speech separation targets at the frame level. Systematic experiments show that the proposed approach consistently achieves better separation and generalization performances in the low signal-to-noise ratio(SNR) conditions.

Original language	English
Title of host publication	16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015
Pages	1503-1507
Number of pages	5
Volume	2015-January
Publication status	Published - 2015
Externally published	Yes
Event	16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany Duration: 6 Sept 2015 → 10 Sept 2015

Publication series

Name	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISSN (Print)	2308-457X

Conference

Conference	16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015
Country/Territory	Germany
City	Dresden
Period	6/09/15 → 10/09/15

Keywords

Computational auditory scene analysis (CASA)
Multi-target learning
Speech separation

Cite this

Nie, S., Liang, S., Xue, W., Zhang, X., Liu, W., Dong, L., & Yang, H. (2015). Two-Stage Multi-Target Joint Learning for Monaural Speech Separation. In 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 (Vol. 2015-January, pp. 1503-1507). (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH).

@inproceedings{d7ee07aca4ba48fd91fc39dff837292f,

title = "Two-Stage Multi-Target Joint Learning for Monaural Speech Separation",

abstract = "Recently, supervised speech separation has been extensively studied and shown considerable promise. Due to the temporal continuity of speech, speech auditory features and separation targets present prominent spectro-temporal structures and strong correlations over the time-frequency (T-F) domain, which can be exploited for speech separation. However, many supervised speech separation methods independently model each T-F unit with only one target and much ignore these useful information. In this paper, we propose a two-stage multi-target joint learning method to jointly model the related speech separation targets at the frame level. Systematic experiments show that the proposed approach consistently achieves better separation and generalization performances in the low signal-to-noise ratio(SNR) conditions.",

keywords = "Computational auditory scene analysis (CASA), Multi-target learning, Speech separation",

author = "Shuai Nie and Shan Liang and Wei Xue and Xueliang Zhang and Wenju Liu and Like Dong and Hong Yang",

note = "Publisher Copyright: Copyright {\textcopyright} 2015 ISCA.; 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 ; Conference date: 06-09-2015 Through 10-09-2015",

year = "2015",

language = "English",

volume = "2015-January",

series = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

pages = "1503--1507",

booktitle = "16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015",

}

Nie, S, Liang, S, Xue, W, Zhang, X, Liu, W, Dong, L & Yang, H 2015, Two-Stage Multi-Target Joint Learning for Monaural Speech Separation. in 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015. vol. 2015-January, Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp. 1503-1507, 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015, Dresden, Germany, 6/09/15.

Two-Stage Multi-Target Joint Learning for Monaural Speech Separation. / Nie, Shuai; Liang, Shan; Xue, Wei et al.
16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015. Vol. 2015-January 2015. p. 1503-1507 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Two-Stage Multi-Target Joint Learning for Monaural Speech Separation

AU - Nie, Shuai

AU - Liang, Shan

AU - Xue, Wei

AU - Zhang, Xueliang

AU - Liu, Wenju

AU - Dong, Like

AU - Yang, Hong

PY - 2015

Y1 - 2015

N2 - Recently, supervised speech separation has been extensively studied and shown considerable promise. Due to the temporal continuity of speech, speech auditory features and separation targets present prominent spectro-temporal structures and strong correlations over the time-frequency (T-F) domain, which can be exploited for speech separation. However, many supervised speech separation methods independently model each T-F unit with only one target and much ignore these useful information. In this paper, we propose a two-stage multi-target joint learning method to jointly model the related speech separation targets at the frame level. Systematic experiments show that the proposed approach consistently achieves better separation and generalization performances in the low signal-to-noise ratio(SNR) conditions.

AB - Recently, supervised speech separation has been extensively studied and shown considerable promise. Due to the temporal continuity of speech, speech auditory features and separation targets present prominent spectro-temporal structures and strong correlations over the time-frequency (T-F) domain, which can be exploited for speech separation. However, many supervised speech separation methods independently model each T-F unit with only one target and much ignore these useful information. In this paper, we propose a two-stage multi-target joint learning method to jointly model the related speech separation targets at the frame level. Systematic experiments show that the proposed approach consistently achieves better separation and generalization performances in the low signal-to-noise ratio(SNR) conditions.

KW - Computational auditory scene analysis (CASA)

KW - Multi-target learning

KW - Speech separation

UR - http://www.scopus.com/inward/record.url?scp=84959170129&partnerID=8YFLogxK

M3 - Conference Proceeding

AN - SCOPUS:84959170129

VL - 2015-January

T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SP - 1503

EP - 1507

BT - 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015

T2 - 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015

Y2 - 6 September 2015 through 10 September 2015

ER -

Two-Stage Multi-Target Joint Learning for Monaural Speech Separation

Abstract

Publication series

Conference

Keywords

Other files and links

Fingerprint

Cite this