Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement

Shuai Nie; Shan Liang; Bin Liu; Yaping Zhang; Wenju Liu; Jianhua Tao

doi:10.21437/Interspeech.2018-1020

Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement

Shuai Nie, Shan Liang^*, Bin Liu, Yaping Zhang, Wenju Liu, Jianhua Tao

^*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

8 Citations (Scopus)

Abstract

Noise statistics and speech spectrum characteristics are the essential information for the single channel speech enhancement. The signal processing-based methods mainly rely on noise statistics estimation. They perform very well for stationary noise, but have remained difficult to cope with non-stationary noise. While the deep learning-based methods mainly focus on the perception on the spectrum characteristics of speech and have a capacity in dealing with non-stationary noise. However, the performance would degrade dramatically for the unseen noise types, which could be due to the over-reliance on data and the ignorance to domain knowledge of signal process. Obviously, the hybrid signal processing/deep learning scheme may be a smart alternative. In this paper, we incorporate the powerful perceptual capabilities of deep learning in the conventional speech enhancement framework. Deep learning is used to estimate the speech presence probability and the update factor of noise statistics, which are then integrated into the Wiener filter-based speech enhancement structure to enhance the desired speech. All components are jointly optimized by a spectrum approximation objective. Systematic experiments on CHiME-4 and NOISEX-92 demonstrate the proposed hybrid signal processing/deep learning approach to noise suppression in noise-unmatched and noise-matched conditions.

Original language	English
Title of host publication	19th Annual Conference of the International Speech Communication, INTERSPEECH 2018
Pages	3219-3223
Number of pages	5
Volume	2018-September
DOIs	https://doi.org/10.21437/Interspeech.2018-1020
Publication status	Published - 2018
Externally published	Yes
Event	19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 - Hyderabad, India Duration: 2 Sept 2018 → 6 Sept 2018

Publication series

Name	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISSN (Print)	2308-457X

Conference

Conference	19th Annual Conference of the International Speech Communication, INTERSPEECH 2018
Country/Territory	India
City	Hyderabad
Period	2/09/18 → 6/09/18

Keywords

Deep learning
Noise tracking
Signal processing
Speech enhancement

Access to Document

10.21437/Interspeech.2018-1020

Cite this

Nie, S., Liang, S., Liu, B., Zhang, Y., Liu, W., & Tao, J. (2018). Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement. In 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 (Vol. 2018-September, pp. 3219-3223). (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH). https://doi.org/10.21437/Interspeech.2018-1020

Nie, Shuai ; Liang, Shan ; Liu, Bin et al. / Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement. 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018. Vol. 2018-September 2018. pp. 3219-3223 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH).

@inproceedings{874e5c018f7c428f8d40f9f26447bf18,

title = "Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement",

abstract = "Noise statistics and speech spectrum characteristics are the essential information for the single channel speech enhancement. The signal processing-based methods mainly rely on noise statistics estimation. They perform very well for stationary noise, but have remained difficult to cope with non-stationary noise. While the deep learning-based methods mainly focus on the perception on the spectrum characteristics of speech and have a capacity in dealing with non-stationary noise. However, the performance would degrade dramatically for the unseen noise types, which could be due to the over-reliance on data and the ignorance to domain knowledge of signal process. Obviously, the hybrid signal processing/deep learning scheme may be a smart alternative. In this paper, we incorporate the powerful perceptual capabilities of deep learning in the conventional speech enhancement framework. Deep learning is used to estimate the speech presence probability and the update factor of noise statistics, which are then integrated into the Wiener filter-based speech enhancement structure to enhance the desired speech. All components are jointly optimized by a spectrum approximation objective. Systematic experiments on CHiME-4 and NOISEX-92 demonstrate the proposed hybrid signal processing/deep learning approach to noise suppression in noise-unmatched and noise-matched conditions.",

keywords = "Deep learning, Noise tracking, Signal processing, Speech enhancement",

author = "Shuai Nie and Shan Liang and Bin Liu and Yaping Zhang and Wenju Liu and Jianhua Tao",

note = "Publisher Copyright: {\textcopyright} 2018 International Speech Communication Association. All rights reserved.; 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 ; Conference date: 02-09-2018 Through 06-09-2018",

year = "2018",

doi = "10.21437/Interspeech.2018-1020",

language = "English",

volume = "2018-September",

series = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

pages = "3219--3223",

booktitle = "19th Annual Conference of the International Speech Communication, INTERSPEECH 2018",

}

Nie, S, Liang, S, Liu, B, Zhang, Y, Liu, W & Tao, J 2018, Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement. in 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018. vol. 2018-September, Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp. 3219-3223, 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018, Hyderabad, India, 2/09/18. https://doi.org/10.21437/Interspeech.2018-1020

Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement. / Nie, Shuai; Liang, Shan; Liu, Bin et al.
19th Annual Conference of the International Speech Communication, INTERSPEECH 2018. Vol. 2018-September 2018. p. 3219-3223 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement

AU - Nie, Shuai

AU - Liang, Shan

AU - Liu, Bin

AU - Zhang, Yaping

AU - Liu, Wenju

AU - Tao, Jianhua

PY - 2018

Y1 - 2018

N2 - Noise statistics and speech spectrum characteristics are the essential information for the single channel speech enhancement. The signal processing-based methods mainly rely on noise statistics estimation. They perform very well for stationary noise, but have remained difficult to cope with non-stationary noise. While the deep learning-based methods mainly focus on the perception on the spectrum characteristics of speech and have a capacity in dealing with non-stationary noise. However, the performance would degrade dramatically for the unseen noise types, which could be due to the over-reliance on data and the ignorance to domain knowledge of signal process. Obviously, the hybrid signal processing/deep learning scheme may be a smart alternative. In this paper, we incorporate the powerful perceptual capabilities of deep learning in the conventional speech enhancement framework. Deep learning is used to estimate the speech presence probability and the update factor of noise statistics, which are then integrated into the Wiener filter-based speech enhancement structure to enhance the desired speech. All components are jointly optimized by a spectrum approximation objective. Systematic experiments on CHiME-4 and NOISEX-92 demonstrate the proposed hybrid signal processing/deep learning approach to noise suppression in noise-unmatched and noise-matched conditions.

AB - Noise statistics and speech spectrum characteristics are the essential information for the single channel speech enhancement. The signal processing-based methods mainly rely on noise statistics estimation. They perform very well for stationary noise, but have remained difficult to cope with non-stationary noise. While the deep learning-based methods mainly focus on the perception on the spectrum characteristics of speech and have a capacity in dealing with non-stationary noise. However, the performance would degrade dramatically for the unseen noise types, which could be due to the over-reliance on data and the ignorance to domain knowledge of signal process. Obviously, the hybrid signal processing/deep learning scheme may be a smart alternative. In this paper, we incorporate the powerful perceptual capabilities of deep learning in the conventional speech enhancement framework. Deep learning is used to estimate the speech presence probability and the update factor of noise statistics, which are then integrated into the Wiener filter-based speech enhancement structure to enhance the desired speech. All components are jointly optimized by a spectrum approximation objective. Systematic experiments on CHiME-4 and NOISEX-92 demonstrate the proposed hybrid signal processing/deep learning approach to noise suppression in noise-unmatched and noise-matched conditions.

KW - Deep learning

KW - Noise tracking

KW - Signal processing

KW - Speech enhancement

UR - http://www.scopus.com/inward/record.url?scp=85055002384&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2018-1020

DO - 10.21437/Interspeech.2018-1020

M3 - Conference Proceeding

AN - SCOPUS:85055002384

VL - 2018-September

T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SP - 3219

EP - 3223

BT - 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018

T2 - 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018

Y2 - 2 September 2018 through 6 September 2018

ER -

Nie S, Liang S, Liu B, Zhang Y, Liu W, Tao J. Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement. In 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018. Vol. 2018-September. 2018. p. 3219-3223. (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH). doi: 10.21437/Interspeech.2018-1020

Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this