Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-Channel Speech Recognition

Guanjun Li; Shan Liang; Shuai Nie; Wenju Liu; Zhanlei Yang; Longshuai Xiao

doi:10.21437/Interspeech.2020-1101

Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-Channel Speech Recognition

Guanjun Li, Shan Liang, Shuai Nie, Wenju Liu^*, Zhanlei Yang, Longshuai Xiao

^*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

6 Citations (Scopus)

Abstract

The elastic spatial filter (ESF) proposed in recent years is a popular multi-channel speech enhancement front end based on deep neural network (DNN). It is suitable for real-time processing and has shown promising automatic speech recognition (ASR) results. However, the ESF only utilizes the knowledge of fixed beamforming, resulting in limited noise reduction capabilities. In this paper, we propose a DNN-based generalized sidelobe canceller (GSC) that can automatically track the target speaker's direction in real time and use the blocking technique to generate reference noise signals to further reduce noise from the fixed beam pointing to the target direction. The coefficients in the proposed GSC are fully learnable and an ASR criterion is used to optimize the entire network. The 4-channel experiments show that the proposed GSC achieves a relative word error rate improvement of 27.0% compared to the raw observation, 20.6% compared to the oracle direction-based traditional GSC, 10.5% compared to the ESF and 7.9% compared to the oracle mask-based generalized eigenvalue (GEV) beamformer.

Original language	English
Title of host publication	Interspeech 2020
Publisher	International Speech Communication Association
Pages	51-55
Number of pages	5
ISBN (Print)	9781713820697
DOIs	https://doi.org/10.21437/Interspeech.2020-1101
Publication status	Published - 2020
Externally published	Yes
Event	21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China Duration: 25 Oct 2020 → 29 Oct 2020

Publication series

Name	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume	2020-October
ISSN (Print)	2308-457X
ISSN (Electronic)	1990-9772

Conference

Conference	21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
Country/Territory	China
City	Shanghai
Period	25/10/20 → 29/10/20

Keywords

Deep neural network
Generalized sidelobe canceller
Multi-channel speech enhancement
Speech recognition

Access to Document

10.21437/Interspeech.2020-1101

Cite this

Li, G., Liang, S., Nie, S., Liu, W., Yang, Z., & Xiao, L. (2020). Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-Channel Speech Recognition. In Interspeech 2020 (pp. 51-55). (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; Vol. 2020-October). International Speech Communication Association. https://doi.org/10.21437/Interspeech.2020-1101

@inproceedings{8ee40928ed0b4b5db97517af96cad7ff,

title = "Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-Channel Speech Recognition",

abstract = "The elastic spatial filter (ESF) proposed in recent years is a popular multi-channel speech enhancement front end based on deep neural network (DNN). It is suitable for real-time processing and has shown promising automatic speech recognition (ASR) results. However, the ESF only utilizes the knowledge of fixed beamforming, resulting in limited noise reduction capabilities. In this paper, we propose a DNN-based generalized sidelobe canceller (GSC) that can automatically track the target speaker's direction in real time and use the blocking technique to generate reference noise signals to further reduce noise from the fixed beam pointing to the target direction. The coefficients in the proposed GSC are fully learnable and an ASR criterion is used to optimize the entire network. The 4-channel experiments show that the proposed GSC achieves a relative word error rate improvement of 27.0% compared to the raw observation, 20.6% compared to the oracle direction-based traditional GSC, 10.5% compared to the ESF and 7.9% compared to the oracle mask-based generalized eigenvalue (GEV) beamformer.",

keywords = "Deep neural network, Generalized sidelobe canceller, Multi-channel speech enhancement, Speech recognition",

author = "Guanjun Li and Shan Liang and Shuai Nie and Wenju Liu and Zhanlei Yang and Longshuai Xiao",

note = "Publisher Copyright: Copyright {\textcopyright} 2020 ISCA; 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 ; Conference date: 25-10-2020 Through 29-10-2020",

year = "2020",

doi = "10.21437/Interspeech.2020-1101",

language = "English",

isbn = "9781713820697",

series = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

publisher = "International Speech Communication Association",

pages = "51--55",

booktitle = "Interspeech 2020",

}

Li, G, Liang, S, Nie, S, Liu, W, Yang, Z & Xiao, L 2020, Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-Channel Speech Recognition. in Interspeech 2020. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 2020-October, International Speech Communication Association, pp. 51-55, 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020, Shanghai, China, 25/10/20. https://doi.org/10.21437/Interspeech.2020-1101

Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-Channel Speech Recognition. / Li, Guanjun; Liang, Shan; Nie, Shuai et al.
Interspeech 2020. International Speech Communication Association, 2020. p. 51-55 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; Vol. 2020-October).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-Channel Speech Recognition

AU - Li, Guanjun

AU - Liang, Shan

AU - Nie, Shuai

AU - Liu, Wenju

AU - Yang, Zhanlei

AU - Xiao, Longshuai

PY - 2020

Y1 - 2020

N2 - The elastic spatial filter (ESF) proposed in recent years is a popular multi-channel speech enhancement front end based on deep neural network (DNN). It is suitable for real-time processing and has shown promising automatic speech recognition (ASR) results. However, the ESF only utilizes the knowledge of fixed beamforming, resulting in limited noise reduction capabilities. In this paper, we propose a DNN-based generalized sidelobe canceller (GSC) that can automatically track the target speaker's direction in real time and use the blocking technique to generate reference noise signals to further reduce noise from the fixed beam pointing to the target direction. The coefficients in the proposed GSC are fully learnable and an ASR criterion is used to optimize the entire network. The 4-channel experiments show that the proposed GSC achieves a relative word error rate improvement of 27.0% compared to the raw observation, 20.6% compared to the oracle direction-based traditional GSC, 10.5% compared to the ESF and 7.9% compared to the oracle mask-based generalized eigenvalue (GEV) beamformer.

AB - The elastic spatial filter (ESF) proposed in recent years is a popular multi-channel speech enhancement front end based on deep neural network (DNN). It is suitable for real-time processing and has shown promising automatic speech recognition (ASR) results. However, the ESF only utilizes the knowledge of fixed beamforming, resulting in limited noise reduction capabilities. In this paper, we propose a DNN-based generalized sidelobe canceller (GSC) that can automatically track the target speaker's direction in real time and use the blocking technique to generate reference noise signals to further reduce noise from the fixed beam pointing to the target direction. The coefficients in the proposed GSC are fully learnable and an ASR criterion is used to optimize the entire network. The 4-channel experiments show that the proposed GSC achieves a relative word error rate improvement of 27.0% compared to the raw observation, 20.6% compared to the oracle direction-based traditional GSC, 10.5% compared to the ESF and 7.9% compared to the oracle mask-based generalized eigenvalue (GEV) beamformer.

KW - Deep neural network

KW - Generalized sidelobe canceller

KW - Multi-channel speech enhancement

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=85098213803&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2020-1101

DO - 10.21437/Interspeech.2020-1101

M3 - Conference Proceeding

AN - SCOPUS:85098213803

SN - 9781713820697

T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SP - 51

EP - 55

BT - Interspeech 2020

PB - International Speech Communication Association

T2 - 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020

Y2 - 25 October 2020 through 29 October 2020

ER -

Li G, Liang S, Nie S, Liu W, Yang Z, Xiao L. Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-Channel Speech Recognition. In Interspeech 2020. International Speech Communication Association. 2020. p. 51-55. (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH). doi: 10.21437/Interspeech.2020-1101

Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-Channel Speech Recognition

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this