Echo Cancelation and Noise Suppression by Training a Dual-Stream Recurrent Network with a Mixture of Training Targets

Fatemeh Alishahi; Yin Cao; Youngkoen Kim; Asif Mohammad

doi:10.1109/IWAENC53105.2022.9914701

Echo Cancelation and Noise Suppression by Training a Dual-Stream Recurrent Network with a Mixture of Training Targets

Fatemeh Alishahi, Yin Cao, Youngkoen Kim, Asif Mohammad

Qualcomm Incorporated

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

Abstract

Nonlinear echo in presence of background noise can degrade the performance of digital signal processing algorithms. Deep neural networks with their ability to model complex nonlinear functions can potentially address this issue. In this paper, a deep and causal neural network based on dual streaming of the near-end microphone and far-end speech signals is employed to leverage the real-time nonlinear echo cancellation and noise suppression. The extracted features of two streams are coupled into a shared neural network for joint echo and noise cancellation. The training target is a mixture of spectral mapping and masking-based targets which are gated through a feedforward neural network. The model is evaluated in terms of both signal-level and perception-level metrics for different scenarios with a range of SI-SDR as low as -25 dB. Furthermore, the effect of mixing of training targets is assessed by evaluating different models.

Original language	English
Title of host publication	International Workshop on Acoustic Signal Enhancement, IWAENC 2022 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9781665468671
DOIs	https://doi.org/10.1109/IWAENC53105.2022.9914701
Publication status	Published - 2022
Externally published	Yes
Event	17th International Workshop on Acoustic Signal Enhancement, IWAENC 2022 - Bamberg, Germany Duration: 5 Sept 2022 → 8 Sept 2022

Publication series

Name	International Workshop on Acoustic Signal Enhancement, IWAENC 2022 - Proceedings

Conference

Conference	17th International Workshop on Acoustic Signal Enhancement, IWAENC 2022
Country/Territory	Germany
City	Bamberg
Period	5/09/22 → 8/09/22

Keywords

Supervised speech enhancement
deep neural network
recurrent neural networks
training targets

Access to Document

10.1109/IWAENC53105.2022.9914701

Cite this

Alishahi, F., Cao, Y., Kim, Y., & Mohammad, A. (2022). Echo Cancelation and Noise Suppression by Training a Dual-Stream Recurrent Network with a Mixture of Training Targets. In International Workshop on Acoustic Signal Enhancement, IWAENC 2022 - Proceedings (International Workshop on Acoustic Signal Enhancement, IWAENC 2022 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IWAENC53105.2022.9914701

Alishahi, Fatemeh ; Cao, Yin ; Kim, Youngkoen et al. / Echo Cancelation and Noise Suppression by Training a Dual-Stream Recurrent Network with a Mixture of Training Targets. International Workshop on Acoustic Signal Enhancement, IWAENC 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2022. (International Workshop on Acoustic Signal Enhancement, IWAENC 2022 - Proceedings).

@inproceedings{36f3473c83414ca698fa8da231213a39,

title = "Echo Cancelation and Noise Suppression by Training a Dual-Stream Recurrent Network with a Mixture of Training Targets",

abstract = "Nonlinear echo in presence of background noise can degrade the performance of digital signal processing algorithms. Deep neural networks with their ability to model complex nonlinear functions can potentially address this issue. In this paper, a deep and causal neural network based on dual streaming of the near-end microphone and far-end speech signals is employed to leverage the real-time nonlinear echo cancellation and noise suppression. The extracted features of two streams are coupled into a shared neural network for joint echo and noise cancellation. The training target is a mixture of spectral mapping and masking-based targets which are gated through a feedforward neural network. The model is evaluated in terms of both signal-level and perception-level metrics for different scenarios with a range of SI-SDR as low as -25 dB. Furthermore, the effect of mixing of training targets is assessed by evaluating different models.",

keywords = "Supervised speech enhancement, deep neural network, recurrent neural networks, training targets",

author = "Fatemeh Alishahi and Yin Cao and Youngkoen Kim and Asif Mohammad",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE.; 17th International Workshop on Acoustic Signal Enhancement, IWAENC 2022 ; Conference date: 05-09-2022 Through 08-09-2022",

year = "2022",

doi = "10.1109/IWAENC53105.2022.9914701",

language = "English",

series = "International Workshop on Acoustic Signal Enhancement, IWAENC 2022 - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "International Workshop on Acoustic Signal Enhancement, IWAENC 2022 - Proceedings",

}

Alishahi, F, Cao, Y, Kim, Y & Mohammad, A 2022, Echo Cancelation and Noise Suppression by Training a Dual-Stream Recurrent Network with a Mixture of Training Targets. in International Workshop on Acoustic Signal Enhancement, IWAENC 2022 - Proceedings. International Workshop on Acoustic Signal Enhancement, IWAENC 2022 - Proceedings, Institute of Electrical and Electronics Engineers Inc., 17th International Workshop on Acoustic Signal Enhancement, IWAENC 2022, Bamberg, Germany, 5/09/22. https://doi.org/10.1109/IWAENC53105.2022.9914701

Echo Cancelation and Noise Suppression by Training a Dual-Stream Recurrent Network with a Mixture of Training Targets. / Alishahi, Fatemeh; Cao, Yin; Kim, Youngkoen et al.
International Workshop on Acoustic Signal Enhancement, IWAENC 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2022. (International Workshop on Acoustic Signal Enhancement, IWAENC 2022 - Proceedings).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Echo Cancelation and Noise Suppression by Training a Dual-Stream Recurrent Network with a Mixture of Training Targets

AU - Alishahi, Fatemeh

AU - Cao, Yin

AU - Kim, Youngkoen

AU - Mohammad, Asif

PY - 2022

Y1 - 2022

N2 - Nonlinear echo in presence of background noise can degrade the performance of digital signal processing algorithms. Deep neural networks with their ability to model complex nonlinear functions can potentially address this issue. In this paper, a deep and causal neural network based on dual streaming of the near-end microphone and far-end speech signals is employed to leverage the real-time nonlinear echo cancellation and noise suppression. The extracted features of two streams are coupled into a shared neural network for joint echo and noise cancellation. The training target is a mixture of spectral mapping and masking-based targets which are gated through a feedforward neural network. The model is evaluated in terms of both signal-level and perception-level metrics for different scenarios with a range of SI-SDR as low as -25 dB. Furthermore, the effect of mixing of training targets is assessed by evaluating different models.

AB - Nonlinear echo in presence of background noise can degrade the performance of digital signal processing algorithms. Deep neural networks with their ability to model complex nonlinear functions can potentially address this issue. In this paper, a deep and causal neural network based on dual streaming of the near-end microphone and far-end speech signals is employed to leverage the real-time nonlinear echo cancellation and noise suppression. The extracted features of two streams are coupled into a shared neural network for joint echo and noise cancellation. The training target is a mixture of spectral mapping and masking-based targets which are gated through a feedforward neural network. The model is evaluated in terms of both signal-level and perception-level metrics for different scenarios with a range of SI-SDR as low as -25 dB. Furthermore, the effect of mixing of training targets is assessed by evaluating different models.

KW - Supervised speech enhancement

KW - deep neural network

KW - recurrent neural networks

KW - training targets

UR - http://www.scopus.com/inward/record.url?scp=85141356722&partnerID=8YFLogxK

U2 - 10.1109/IWAENC53105.2022.9914701

DO - 10.1109/IWAENC53105.2022.9914701

M3 - Conference Proceeding

AN - SCOPUS:85141356722

T3 - International Workshop on Acoustic Signal Enhancement, IWAENC 2022 - Proceedings

BT - International Workshop on Acoustic Signal Enhancement, IWAENC 2022 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 17th International Workshop on Acoustic Signal Enhancement, IWAENC 2022

Y2 - 5 September 2022 through 8 September 2022

ER -

Alishahi F, Cao Y, Kim Y, Mohammad A. Echo Cancelation and Noise Suppression by Training a Dual-Stream Recurrent Network with a Mixture of Training Targets. In International Workshop on Acoustic Signal Enhancement, IWAENC 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2022. (International Workshop on Acoustic Signal Enhancement, IWAENC 2022 - Proceedings). doi: 10.1109/IWAENC53105.2022.9914701

Echo Cancelation and Noise Suppression by Training a Dual-Stream Recurrent Network with a Mixture of Training Targets

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this