TY - GEN
T1 - Exploiting spectro-temporal structures using NMF for DNN-based supervised speech separation
AU - Nie, Shuai
AU - Liang, Shan
AU - Li, Hao
AU - Zhang, Xueliang
AU - Yang, Zhanlei
AU - Liu, Wen Ju
AU - Dong, Li Ke
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016
Y1 - 2016
N2 - The targets of speech separation, whether ideal masks or magnitude spectrograms of interest, have prominent spectro-temporal structures. These characteristics are very worthy to be exploited for speech separation, however, they are usually ignored in previous works. In this paper, we use nonnegative matrix factorization (NMF) to exploit the spectro-temporal structures of magnitude spectrograms. With nonnegative constrains, NMF can capture the basis spectra patterns of speech and noise. Then the learned basis spectra are integrated into a deep neural network (DNN) to reconstruct the magnitude spectrograms of speech and noise with their nonnegative linear combination. Using the reconstructed spectrograms, we further explore a discriminative training objective and a joint optimization framework for the proposed model. Systematic experiments show that the proposed model is competitive with the previous methods in monaural speech separation tasks.
AB - The targets of speech separation, whether ideal masks or magnitude spectrograms of interest, have prominent spectro-temporal structures. These characteristics are very worthy to be exploited for speech separation, however, they are usually ignored in previous works. In this paper, we use nonnegative matrix factorization (NMF) to exploit the spectro-temporal structures of magnitude spectrograms. With nonnegative constrains, NMF can capture the basis spectra patterns of speech and noise. Then the learned basis spectra are integrated into a deep neural network (DNN) to reconstruct the magnitude spectrograms of speech and noise with their nonnegative linear combination. Using the reconstructed spectrograms, we further explore a discriminative training objective and a joint optimization framework for the proposed model. Systematic experiments show that the proposed model is competitive with the previous methods in monaural speech separation tasks.
KW - Deep Neural Network
KW - Nonnegative Matrix Factorization
KW - Spectro-Temporal Structures
KW - Speech Separation
UR - http://www.scopus.com/inward/record.url?scp=84973370104&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2016.7471719
DO - 10.1109/ICASSP.2016.7471719
M3 - Conference Proceeding
AN - SCOPUS:84973370104
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 469
EP - 473
BT - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
Y2 - 20 March 2016 through 25 March 2016
ER -