TY - GEN
T1 - Masking-based Neural Beamformer for Multichannel Speech Enhancement
AU - Nie, Shuai
AU - Liang, Shan
AU - Yang, Zhanlei
AU - Xiao, Longshuai
AU - Liu, Wenju
AU - Tao, Jianhua
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Masking and beamforming techniques have shown considerable promise for multichannel speech enhancement. Masking technology can significantly reduce noise, but inevitably leads to speech distortion, especially in far-field reverb environments. While beamforming technology can effectively avoid speech distortion and perform very well in reverberant conditions. Obviously, masking-based beamforming scheme is a wise alternative. However, most methods use fixed or adaptive beamformers as spatial filters, which are either desinged in advance under certain sound field assumption, with limited noise reduction ability, or involve the complex matrix inverse operation of each frequency, with high computational complexity and instability. In this paper, we propose a fully learnable masking neural beamformer to jointly model masking and beamforming in a data-driven manner. Mask prediction and neural beamformer are jointly optimized by the spectrum and waveform approximation objectives. To improve the directional discrimination in reverb and diffuse noise environments, we further propose to use a pair of complementary fixed beamformers to exploit directional coherence feature (DCF) for mask prediction. Systematic experiments demonstrate the proposed approach is competitive with abailable methods in terms of speech enhancement and speech recognition.
AB - Masking and beamforming techniques have shown considerable promise for multichannel speech enhancement. Masking technology can significantly reduce noise, but inevitably leads to speech distortion, especially in far-field reverb environments. While beamforming technology can effectively avoid speech distortion and perform very well in reverberant conditions. Obviously, masking-based beamforming scheme is a wise alternative. However, most methods use fixed or adaptive beamformers as spatial filters, which are either desinged in advance under certain sound field assumption, with limited noise reduction ability, or involve the complex matrix inverse operation of each frequency, with high computational complexity and instability. In this paper, we propose a fully learnable masking neural beamformer to jointly model masking and beamforming in a data-driven manner. Mask prediction and neural beamformer are jointly optimized by the spectrum and waveform approximation objectives. To improve the directional discrimination in reverb and diffuse noise environments, we further propose to use a pair of complementary fixed beamformers to exploit directional coherence feature (DCF) for mask prediction. Systematic experiments demonstrate the proposed approach is competitive with abailable methods in terms of speech enhancement and speech recognition.
KW - directional coherence features
KW - masking neural beamformer
KW - multichannel speech enhancement
UR - http://www.scopus.com/inward/record.url?scp=85148571211&partnerID=8YFLogxK
U2 - 10.1109/ISCSLP57327.2022.10037878
DO - 10.1109/ISCSLP57327.2022.10037878
M3 - Conference Proceeding
AN - SCOPUS:85148571211
T3 - 2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022
SP - 125
EP - 129
BT - 2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022
A2 - Lee, Kong Aik
A2 - Lee, Hung-yi
A2 - Lu, Yanfeng
A2 - Dong, Minghui
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022
Y2 - 11 December 2022 through 14 December 2022
ER -