Direction-Aware Speaker Beam for Multi-Channel Speaker Extraction

Guanjun Li, Shan Liang, Shuai Nie, Wenju Liu, Meng Yu, Lianwu Chen, Shouye Peng, Changliang Li

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

17 Citations (Scopus)

Abstract

SpeakerBeam is a state-of-the-art method for extracting a speech signal of target speaker from a mixture using an adaption utterance. The existing multi-channel SpeakerBeam utilizes the spectral features of the signals with the ignorance of the spatial discriminability of the multi-channel processing. In this paper, we tightly integrate spectral and spatial information for target speaker extraction. In the proposed scheme, a multi-channel mixture signal is firstly filtered into a set of beamformed signals using fixed beam patterns. An attention network is then designed to identify the direction of the target speaker and to combine the beamformed signals into an enhanced signal dominated by the target speaker energy. Further, SpeakerBeam inputs the enhanced signal and outputs the mask of the target speaker. Finally, the attention network and SpeakerBeam are jointly trained. Experimental results demonstrate that the proposed scheme largely improves the existing multi-channel SpeakerBeam in low signal-to-interference ratio or same-gender scenarios.

Original languageEnglish
Title of host publication20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019
Pages2713-2717
Number of pages5
Volume2019-September
DOIs
Publication statusPublished - 2019
Externally publishedYes
Event20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 - Graz, Austria
Duration: 15 Sept 201919 Sept 2019

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISSN (Print)2308-457X

Conference

Conference20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019
Country/TerritoryAustria
CityGraz
Period15/09/1919/09/19

Keywords

  • Fixed beamforming
  • Jointly training
  • Multi-channel signal processing
  • Speaker extraction

Fingerprint

Dive into the research topics of 'Direction-Aware Speaker Beam for Multi-Channel Speaker Extraction'. Together they form a unique fingerprint.

Cite this