Exploiting spectro-temporal structures using NMF for DNN-based supervised speech separation

Shuai Nie, Shan Liang, Hao Li, Xueliang Zhang, Zhanlei Yang, Wen Ju Liu, Li Ke Dong

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

15 Citations (Scopus)

Abstract

The targets of speech separation, whether ideal masks or magnitude spectrograms of interest, have prominent spectro-temporal structures. These characteristics are very worthy to be exploited for speech separation, however, they are usually ignored in previous works. In this paper, we use nonnegative matrix factorization (NMF) to exploit the spectro-temporal structures of magnitude spectrograms. With nonnegative constrains, NMF can capture the basis spectra patterns of speech and noise. Then the learned basis spectra are integrated into a deep neural network (DNN) to reconstruct the magnitude spectrograms of speech and noise with their nonnegative linear combination. Using the reconstructed spectrograms, we further explore a discriminative training objective and a joint optimization framework for the proposed model. Systematic experiments show that the proposed model is competitive with the previous methods in monaural speech separation tasks.

Original languageEnglish
Title of host publication2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages469-473
Number of pages5
ISBN (Electronic)9781479999880
DOIs
Publication statusPublished - 2016
Externally publishedYes
Event41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Shanghai, China
Duration: 20 Mar 201625 Mar 2016

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2016-May
ISSN (Print)1520-6149

Conference

Conference41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
Country/TerritoryChina
CityShanghai
Period20/03/1625/03/16

Keywords

  • Deep Neural Network
  • Nonnegative Matrix Factorization
  • Spectro-Temporal Structures
  • Speech Separation

Fingerprint

Dive into the research topics of 'Exploiting spectro-temporal structures using NMF for DNN-based supervised speech separation'. Together they form a unique fingerprint.

Cite this