Cross-domain cooperative deep stacking network for speech separation

Wei Jiang; Shan Liang; Like Dong; Hong Yang; Wenju Liu; Yunji Wang

doi:10.1109/ICASSP.2015.7178939

Cross-domain cooperative deep stacking network for speech separation

Wei Jiang, Shan Liang, Like Dong, Hong Yang, Wenju Liu, Yunji Wang

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

3 Citations (Scopus)

Abstract

Nowadays supervised speech separation has drawn much attention and shown great promise in the meantime. While there has been a lot of success, existing algorithms perform the task only in one preselected representative domain. In this study, we propose to perform the task in two different time-frequency domains simultaneously and cooperatively, which can model the implicit correlations between different representations of the same speech separation task. Besides, many time-frequency (T-F) units are dominated by noise in low signal-to-noise ratio (SNR) conditions, so more robust features are obtained by stacking features of original mixtures with that extracted from separated speech of each deep stacking network (DSN) block, which can be regarded as a denoised version of the original features. Quantitative experiments show that the proposed cross-domain cooperative deep stacking network (DSN-CDC) has enhanced modeling capability as well as generalization ability, which outperforms a previous algorithm based on standard deep neural networks.

Original language	English
Title of host publication	2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	5083-5087
Number of pages	5
ISBN (Electronic)	9781467369978
DOIs	https://doi.org/10.1109/ICASSP.2015.7178939
Publication status	Published - 2015
Externally published	Yes
Event	40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Brisbane, Australia Duration: 19 Apr 2014 → 24 Apr 2014

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume	2015-August
ISSN (Print)	1520-6149

Conference

Conference	40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015
Country/Territory	Australia
City	Brisbane
Period	19/04/14 → 24/04/14

Keywords

cross-domain cooperative structure
deep neural network
deep stacking network
Speech separation

Access to Document

10.1109/ICASSP.2015.7178939

Cite this

Jiang, W., Liang, S., Dong, L., Yang, H., Liu, W., & Wang, Y. (2015). Cross-domain cooperative deep stacking network for speech separation. In 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings (pp. 5083-5087). Article 7178939 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2015-August). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2015.7178939

Jiang, Wei ; Liang, Shan ; Dong, Like et al. / Cross-domain cooperative deep stacking network for speech separation. 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 5083-5087 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{a592ef197dc94de8821dc658b2dbf9a1,

title = "Cross-domain cooperative deep stacking network for speech separation",

abstract = "Nowadays supervised speech separation has drawn much attention and shown great promise in the meantime. While there has been a lot of success, existing algorithms perform the task only in one preselected representative domain. In this study, we propose to perform the task in two different time-frequency domains simultaneously and cooperatively, which can model the implicit correlations between different representations of the same speech separation task. Besides, many time-frequency (T-F) units are dominated by noise in low signal-to-noise ratio (SNR) conditions, so more robust features are obtained by stacking features of original mixtures with that extracted from separated speech of each deep stacking network (DSN) block, which can be regarded as a denoised version of the original features. Quantitative experiments show that the proposed cross-domain cooperative deep stacking network (DSN-CDC) has enhanced modeling capability as well as generalization ability, which outperforms a previous algorithm based on standard deep neural networks.",

keywords = "cross-domain cooperative structure, deep neural network, deep stacking network, Speech separation",

author = "Wei Jiang and Shan Liang and Like Dong and Hong Yang and Wenju Liu and Yunji Wang",

note = "Publisher Copyright: {\textcopyright} 2015 IEEE.; 40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 ; Conference date: 19-04-2014 Through 24-04-2014",

year = "2015",

doi = "10.1109/ICASSP.2015.7178939",

language = "English",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "5083--5087",

booktitle = "2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings",

}

Jiang, W, Liang, S, Dong, L, Yang, H, Liu, W & Wang, Y 2015, Cross-domain cooperative deep stacking network for speech separation. in 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings., 7178939, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2015-August, Institute of Electrical and Electronics Engineers Inc., pp. 5083-5087, 40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015, Brisbane, Australia, 19/04/14. https://doi.org/10.1109/ICASSP.2015.7178939

Cross-domain cooperative deep stacking network for speech separation. / Jiang, Wei; Liang, Shan; Dong, Like et al.
2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2015. p. 5083-5087 7178939 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2015-August).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Cross-domain cooperative deep stacking network for speech separation

AU - Jiang, Wei

AU - Liang, Shan

AU - Dong, Like

AU - Yang, Hong

AU - Liu, Wenju

AU - Wang, Yunji

PY - 2015

Y1 - 2015

N2 - Nowadays supervised speech separation has drawn much attention and shown great promise in the meantime. While there has been a lot of success, existing algorithms perform the task only in one preselected representative domain. In this study, we propose to perform the task in two different time-frequency domains simultaneously and cooperatively, which can model the implicit correlations between different representations of the same speech separation task. Besides, many time-frequency (T-F) units are dominated by noise in low signal-to-noise ratio (SNR) conditions, so more robust features are obtained by stacking features of original mixtures with that extracted from separated speech of each deep stacking network (DSN) block, which can be regarded as a denoised version of the original features. Quantitative experiments show that the proposed cross-domain cooperative deep stacking network (DSN-CDC) has enhanced modeling capability as well as generalization ability, which outperforms a previous algorithm based on standard deep neural networks.

AB - Nowadays supervised speech separation has drawn much attention and shown great promise in the meantime. While there has been a lot of success, existing algorithms perform the task only in one preselected representative domain. In this study, we propose to perform the task in two different time-frequency domains simultaneously and cooperatively, which can model the implicit correlations between different representations of the same speech separation task. Besides, many time-frequency (T-F) units are dominated by noise in low signal-to-noise ratio (SNR) conditions, so more robust features are obtained by stacking features of original mixtures with that extracted from separated speech of each deep stacking network (DSN) block, which can be regarded as a denoised version of the original features. Quantitative experiments show that the proposed cross-domain cooperative deep stacking network (DSN-CDC) has enhanced modeling capability as well as generalization ability, which outperforms a previous algorithm based on standard deep neural networks.

KW - cross-domain cooperative structure

KW - deep neural network

KW - deep stacking network

KW - Speech separation

UR - http://www.scopus.com/inward/record.url?scp=84946079770&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2015.7178939

DO - 10.1109/ICASSP.2015.7178939

M3 - Conference Proceeding

AN - SCOPUS:84946079770

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 5083

EP - 5087

BT - 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015

Y2 - 19 April 2014 through 24 April 2014

ER -

Jiang W, Liang S, Dong L, Yang H, Liu W, Wang Y. Cross-domain cooperative deep stacking network for speech separation. In 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2015. p. 5083-5087. 7178939. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP.2015.7178939

Cross-domain cooperative deep stacking network for speech separation

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this