Masking-based Neural Beamformer for Multichannel Speech Enhancement

Shuai Nie; Shan Liang; Zhanlei Yang; Longshuai Xiao; Wenju Liu; Jianhua Tao

doi:10.1109/ISCSLP57327.2022.10037878

Masking-based Neural Beamformer for Multichannel Speech Enhancement

Shuai Nie, Shan Liang, Zhanlei Yang, Longshuai Xiao, Wenju Liu, Jianhua Tao

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

2 Citations (Scopus)

Abstract

Masking and beamforming techniques have shown considerable promise for multichannel speech enhancement. Masking technology can significantly reduce noise, but inevitably leads to speech distortion, especially in far-field reverb environments. While beamforming technology can effectively avoid speech distortion and perform very well in reverberant conditions. Obviously, masking-based beamforming scheme is a wise alternative. However, most methods use fixed or adaptive beamformers as spatial filters, which are either desinged in advance under certain sound field assumption, with limited noise reduction ability, or involve the complex matrix inverse operation of each frequency, with high computational complexity and instability. In this paper, we propose a fully learnable masking neural beamformer to jointly model masking and beamforming in a data-driven manner. Mask prediction and neural beamformer are jointly optimized by the spectrum and waveform approximation objectives. To improve the directional discrimination in reverb and diffuse noise environments, we further propose to use a pair of complementary fixed beamformers to exploit directional coherence feature (DCF) for mask prediction. Systematic experiments demonstrate the proposed approach is competitive with abailable methods in terms of speech enhancement and speech recognition.

Original language	English
Title of host publication	2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022
Editors	Kong Aik Lee, Hung-yi Lee, Yanfeng Lu, Minghui Dong
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	125-129
Number of pages	5
ISBN (Electronic)	9798350397963
DOIs	https://doi.org/10.1109/ISCSLP57327.2022.10037878
Publication status	Published - 2022
Externally published	Yes
Event	13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022 - Singapore, Singapore Duration: 11 Dec 2022 → 14 Dec 2022

Publication series

Name	2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022

Conference

Conference	13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022
Country/Territory	Singapore
City	Singapore
Period	11/12/22 → 14/12/22

Keywords

directional coherence features
masking neural beamformer
multichannel speech enhancement

Access to Document

10.1109/ISCSLP57327.2022.10037878

Cite this

Nie, S., Liang, S., Yang, Z., Xiao, L., Liu, W., & Tao, J. (2022). Masking-based Neural Beamformer for Multichannel Speech Enhancement. In K. A. Lee, H. Lee, Y. Lu, & M. Dong (Eds.), 2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022 (pp. 125-129). (2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ISCSLP57327.2022.10037878

Nie, Shuai ; Liang, Shan ; Yang, Zhanlei et al. / Masking-based Neural Beamformer for Multichannel Speech Enhancement. 2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022. editor / Kong Aik Lee ; Hung-yi Lee ; Yanfeng Lu ; Minghui Dong. Institute of Electrical and Electronics Engineers Inc., 2022. pp. 125-129 (2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022).

@inproceedings{370068c1669741df8ca8174984f7d9ed,

title = "Masking-based Neural Beamformer for Multichannel Speech Enhancement",

abstract = "Masking and beamforming techniques have shown considerable promise for multichannel speech enhancement. Masking technology can significantly reduce noise, but inevitably leads to speech distortion, especially in far-field reverb environments. While beamforming technology can effectively avoid speech distortion and perform very well in reverberant conditions. Obviously, masking-based beamforming scheme is a wise alternative. However, most methods use fixed or adaptive beamformers as spatial filters, which are either desinged in advance under certain sound field assumption, with limited noise reduction ability, or involve the complex matrix inverse operation of each frequency, with high computational complexity and instability. In this paper, we propose a fully learnable masking neural beamformer to jointly model masking and beamforming in a data-driven manner. Mask prediction and neural beamformer are jointly optimized by the spectrum and waveform approximation objectives. To improve the directional discrimination in reverb and diffuse noise environments, we further propose to use a pair of complementary fixed beamformers to exploit directional coherence feature (DCF) for mask prediction. Systematic experiments demonstrate the proposed approach is competitive with abailable methods in terms of speech enhancement and speech recognition.",

keywords = "directional coherence features, masking neural beamformer, multichannel speech enhancement",

author = "Shuai Nie and Shan Liang and Zhanlei Yang and Longshuai Xiao and Wenju Liu and Jianhua Tao",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE.; 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022 ; Conference date: 11-12-2022 Through 14-12-2022",

year = "2022",

doi = "10.1109/ISCSLP57327.2022.10037878",

language = "English",

series = "2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "125--129",

editor = "Lee, {Kong Aik} and Hung-yi Lee and Yanfeng Lu and Minghui Dong",

booktitle = "2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022",

}

Nie, S, Liang, S, Yang, Z, Xiao, L, Liu, W & Tao, J 2022, Masking-based Neural Beamformer for Multichannel Speech Enhancement. in KA Lee, H Lee, Y Lu & M Dong (eds), 2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022. 2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022, Institute of Electrical and Electronics Engineers Inc., pp. 125-129, 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022, Singapore, Singapore, 11/12/22. https://doi.org/10.1109/ISCSLP57327.2022.10037878

Masking-based Neural Beamformer for Multichannel Speech Enhancement. / Nie, Shuai; Liang, Shan; Yang, Zhanlei et al.
2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022. ed. / Kong Aik Lee; Hung-yi Lee; Yanfeng Lu; Minghui Dong. Institute of Electrical and Electronics Engineers Inc., 2022. p. 125-129 (2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Masking-based Neural Beamformer for Multichannel Speech Enhancement

AU - Nie, Shuai

AU - Liang, Shan

AU - Yang, Zhanlei

AU - Xiao, Longshuai

AU - Liu, Wenju

AU - Tao, Jianhua

PY - 2022

Y1 - 2022

N2 - Masking and beamforming techniques have shown considerable promise for multichannel speech enhancement. Masking technology can significantly reduce noise, but inevitably leads to speech distortion, especially in far-field reverb environments. While beamforming technology can effectively avoid speech distortion and perform very well in reverberant conditions. Obviously, masking-based beamforming scheme is a wise alternative. However, most methods use fixed or adaptive beamformers as spatial filters, which are either desinged in advance under certain sound field assumption, with limited noise reduction ability, or involve the complex matrix inverse operation of each frequency, with high computational complexity and instability. In this paper, we propose a fully learnable masking neural beamformer to jointly model masking and beamforming in a data-driven manner. Mask prediction and neural beamformer are jointly optimized by the spectrum and waveform approximation objectives. To improve the directional discrimination in reverb and diffuse noise environments, we further propose to use a pair of complementary fixed beamformers to exploit directional coherence feature (DCF) for mask prediction. Systematic experiments demonstrate the proposed approach is competitive with abailable methods in terms of speech enhancement and speech recognition.

AB - Masking and beamforming techniques have shown considerable promise for multichannel speech enhancement. Masking technology can significantly reduce noise, but inevitably leads to speech distortion, especially in far-field reverb environments. While beamforming technology can effectively avoid speech distortion and perform very well in reverberant conditions. Obviously, masking-based beamforming scheme is a wise alternative. However, most methods use fixed or adaptive beamformers as spatial filters, which are either desinged in advance under certain sound field assumption, with limited noise reduction ability, or involve the complex matrix inverse operation of each frequency, with high computational complexity and instability. In this paper, we propose a fully learnable masking neural beamformer to jointly model masking and beamforming in a data-driven manner. Mask prediction and neural beamformer are jointly optimized by the spectrum and waveform approximation objectives. To improve the directional discrimination in reverb and diffuse noise environments, we further propose to use a pair of complementary fixed beamformers to exploit directional coherence feature (DCF) for mask prediction. Systematic experiments demonstrate the proposed approach is competitive with abailable methods in terms of speech enhancement and speech recognition.

KW - directional coherence features

KW - masking neural beamformer

KW - multichannel speech enhancement

UR - http://www.scopus.com/inward/record.url?scp=85148571211&partnerID=8YFLogxK

U2 - 10.1109/ISCSLP57327.2022.10037878

DO - 10.1109/ISCSLP57327.2022.10037878

M3 - Conference Proceeding

AN - SCOPUS:85148571211

T3 - 2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022

SP - 125

EP - 129

BT - 2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022

A2 - Lee, Kong Aik

A2 - Lee, Hung-yi

A2 - Lu, Yanfeng

A2 - Dong, Minghui

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022

Y2 - 11 December 2022 through 14 December 2022

ER -

Nie S, Liang S, Yang Z, Xiao L, Liu W, Tao J. Masking-based Neural Beamformer for Multichannel Speech Enhancement. In Lee KA, Lee H, Lu Y, Dong M, editors, 2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022. Institute of Electrical and Electronics Engineers Inc. 2022. p. 125-129. (2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022). doi: 10.1109/ISCSLP57327.2022.10037878

Masking-based Neural Beamformer for Multichannel Speech Enhancement

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this