Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement

Shuai Nie, Shan Liang*, Bin Liu, Yaping Zhang, Wenju Liu, Jianhua Tao

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

6 Citations (Scopus)

Abstract

Noise statistics and speech spectrum characteristics are the essential information for the single channel speech enhancement. The signal processing-based methods mainly rely on noise statistics estimation. They perform very well for stationary noise, but have remained difficult to cope with non-stationary noise. While the deep learning-based methods mainly focus on the perception on the spectrum characteristics of speech and have a capacity in dealing with non-stationary noise. However, the performance would degrade dramatically for the unseen noise types, which could be due to the over-reliance on data and the ignorance to domain knowledge of signal process. Obviously, the hybrid signal processing/deep learning scheme may be a smart alternative. In this paper, we incorporate the powerful perceptual capabilities of deep learning in the conventional speech enhancement framework. Deep learning is used to estimate the speech presence probability and the update factor of noise statistics, which are then integrated into the Wiener filter-based speech enhancement structure to enhance the desired speech. All components are jointly optimized by a spectrum approximation objective. Systematic experiments on CHiME-4 and NOISEX-92 demonstrate the proposed hybrid signal processing/deep learning approach to noise suppression in noise-unmatched and noise-matched conditions.

Original languageEnglish
Title of host publication 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018
Pages3219-3223
Number of pages5
Volume2018-September
DOIs
Publication statusPublished - 2018
Externally publishedYes
Event19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 - Hyderabad, India
Duration: 2 Sept 20186 Sept 2018

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISSN (Print)2308-457X

Conference

Conference19th Annual Conference of the International Speech Communication, INTERSPEECH 2018
Country/TerritoryIndia
CityHyderabad
Period2/09/186/09/18

Keywords

  • Deep learning
  • Noise tracking
  • Signal processing
  • Speech enhancement

Fingerprint

Dive into the research topics of 'Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement'. Together they form a unique fingerprint.

Cite this