TY - GEN
T1 - Cross-domain cooperative deep stacking network for speech separation
AU - Jiang, Wei
AU - Liang, Shan
AU - Dong, Like
AU - Yang, Hong
AU - Liu, Wenju
AU - Wang, Yunji
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015
Y1 - 2015
N2 - Nowadays supervised speech separation has drawn much attention and shown great promise in the meantime. While there has been a lot of success, existing algorithms perform the task only in one preselected representative domain. In this study, we propose to perform the task in two different time-frequency domains simultaneously and cooperatively, which can model the implicit correlations between different representations of the same speech separation task. Besides, many time-frequency (T-F) units are dominated by noise in low signal-to-noise ratio (SNR) conditions, so more robust features are obtained by stacking features of original mixtures with that extracted from separated speech of each deep stacking network (DSN) block, which can be regarded as a denoised version of the original features. Quantitative experiments show that the proposed cross-domain cooperative deep stacking network (DSN-CDC) has enhanced modeling capability as well as generalization ability, which outperforms a previous algorithm based on standard deep neural networks.
AB - Nowadays supervised speech separation has drawn much attention and shown great promise in the meantime. While there has been a lot of success, existing algorithms perform the task only in one preselected representative domain. In this study, we propose to perform the task in two different time-frequency domains simultaneously and cooperatively, which can model the implicit correlations between different representations of the same speech separation task. Besides, many time-frequency (T-F) units are dominated by noise in low signal-to-noise ratio (SNR) conditions, so more robust features are obtained by stacking features of original mixtures with that extracted from separated speech of each deep stacking network (DSN) block, which can be regarded as a denoised version of the original features. Quantitative experiments show that the proposed cross-domain cooperative deep stacking network (DSN-CDC) has enhanced modeling capability as well as generalization ability, which outperforms a previous algorithm based on standard deep neural networks.
KW - cross-domain cooperative structure
KW - deep neural network
KW - deep stacking network
KW - Speech separation
UR - http://www.scopus.com/inward/record.url?scp=84946079770&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2015.7178939
DO - 10.1109/ICASSP.2015.7178939
M3 - Conference Proceeding
AN - SCOPUS:84946079770
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 5083
EP - 5087
BT - 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015
Y2 - 19 April 2014 through 24 April 2014
ER -