Direction-Aware Speaker Beam for Multi-Channel Speaker Extraction

Guanjun Li; Shan Liang; Shuai Nie; Wenju Liu; Meng Yu; Lianwu Chen; Shouye Peng; Changliang Li

doi:10.21437/Interspeech.2019-1474

Direction-Aware Speaker Beam for Multi-Channel Speaker Extraction

Guanjun Li, Shan Liang, Shuai Nie, Wenju Liu, Meng Yu, Lianwu Chen, Shouye Peng, Changliang Li

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

17 Citations (Scopus)

Abstract

SpeakerBeam is a state-of-the-art method for extracting a speech signal of target speaker from a mixture using an adaption utterance. The existing multi-channel SpeakerBeam utilizes the spectral features of the signals with the ignorance of the spatial discriminability of the multi-channel processing. In this paper, we tightly integrate spectral and spatial information for target speaker extraction. In the proposed scheme, a multi-channel mixture signal is firstly filtered into a set of beamformed signals using fixed beam patterns. An attention network is then designed to identify the direction of the target speaker and to combine the beamformed signals into an enhanced signal dominated by the target speaker energy. Further, SpeakerBeam inputs the enhanced signal and outputs the mask of the target speaker. Finally, the attention network and SpeakerBeam are jointly trained. Experimental results demonstrate that the proposed scheme largely improves the existing multi-channel SpeakerBeam in low signal-to-interference ratio or same-gender scenarios.

Original language	English
Title of host publication	20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019
Pages	2713-2717
Number of pages	5
Volume	2019-September
DOIs	https://doi.org/10.21437/Interspeech.2019-1474
Publication status	Published - 2019
Externally published	Yes
Event	20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 - Graz, Austria Duration: 15 Sept 2019 → 19 Sept 2019

Publication series

Name	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISSN (Print)	2308-457X

Conference

Conference	20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019
Country/Territory	Austria
City	Graz
Period	15/09/19 → 19/09/19

Keywords

Fixed beamforming
Jointly training
Multi-channel signal processing
Speaker extraction

Access to Document

10.21437/Interspeech.2019-1474

Cite this

Li, G., Liang, S., Nie, S., Liu, W., Yu, M., Chen, L., Peng, S., & Li, C. (2019). Direction-Aware Speaker Beam for Multi-Channel Speaker Extraction. In 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 (Vol. 2019-September, pp. 2713-2717). (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH). https://doi.org/10.21437/Interspeech.2019-1474

Li, Guanjun ; Liang, Shan ; Nie, Shuai et al. / Direction-Aware Speaker Beam for Multi-Channel Speaker Extraction. 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019. Vol. 2019-September 2019. pp. 2713-2717 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH).

@inproceedings{fe85c385135c42f5b740217a763deb3e,

title = "Direction-Aware Speaker Beam for Multi-Channel Speaker Extraction",

abstract = "SpeakerBeam is a state-of-the-art method for extracting a speech signal of target speaker from a mixture using an adaption utterance. The existing multi-channel SpeakerBeam utilizes the spectral features of the signals with the ignorance of the spatial discriminability of the multi-channel processing. In this paper, we tightly integrate spectral and spatial information for target speaker extraction. In the proposed scheme, a multi-channel mixture signal is firstly filtered into a set of beamformed signals using fixed beam patterns. An attention network is then designed to identify the direction of the target speaker and to combine the beamformed signals into an enhanced signal dominated by the target speaker energy. Further, SpeakerBeam inputs the enhanced signal and outputs the mask of the target speaker. Finally, the attention network and SpeakerBeam are jointly trained. Experimental results demonstrate that the proposed scheme largely improves the existing multi-channel SpeakerBeam in low signal-to-interference ratio or same-gender scenarios.",

keywords = "Fixed beamforming, Jointly training, Multi-channel signal processing, Speaker extraction",

author = "Guanjun Li and Shan Liang and Shuai Nie and Wenju Liu and Meng Yu and Lianwu Chen and Shouye Peng and Changliang Li",

note = "Publisher Copyright: Copyright {\textcopyright} 2019 ISCA; 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 ; Conference date: 15-09-2019 Through 19-09-2019",

year = "2019",

doi = "10.21437/Interspeech.2019-1474",

language = "English",

volume = "2019-September",

series = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

pages = "2713--2717",

booktitle = "20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019",

}

Li, G, Liang, S, Nie, S, Liu, W, Yu, M, Chen, L, Peng, S & Li, C 2019, Direction-Aware Speaker Beam for Multi-Channel Speaker Extraction. in 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019. vol. 2019-September, Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp. 2713-2717, 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019, Graz, Austria, 15/09/19. https://doi.org/10.21437/Interspeech.2019-1474

Direction-Aware Speaker Beam for Multi-Channel Speaker Extraction. / Li, Guanjun; Liang, Shan; Nie, Shuai et al.
20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019. Vol. 2019-September 2019. p. 2713-2717 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Direction-Aware Speaker Beam for Multi-Channel Speaker Extraction

AU - Li, Guanjun

AU - Liang, Shan

AU - Nie, Shuai

AU - Liu, Wenju

AU - Yu, Meng

AU - Chen, Lianwu

AU - Peng, Shouye

AU - Li, Changliang

PY - 2019

Y1 - 2019

N2 - SpeakerBeam is a state-of-the-art method for extracting a speech signal of target speaker from a mixture using an adaption utterance. The existing multi-channel SpeakerBeam utilizes the spectral features of the signals with the ignorance of the spatial discriminability of the multi-channel processing. In this paper, we tightly integrate spectral and spatial information for target speaker extraction. In the proposed scheme, a multi-channel mixture signal is firstly filtered into a set of beamformed signals using fixed beam patterns. An attention network is then designed to identify the direction of the target speaker and to combine the beamformed signals into an enhanced signal dominated by the target speaker energy. Further, SpeakerBeam inputs the enhanced signal and outputs the mask of the target speaker. Finally, the attention network and SpeakerBeam are jointly trained. Experimental results demonstrate that the proposed scheme largely improves the existing multi-channel SpeakerBeam in low signal-to-interference ratio or same-gender scenarios.

AB - SpeakerBeam is a state-of-the-art method for extracting a speech signal of target speaker from a mixture using an adaption utterance. The existing multi-channel SpeakerBeam utilizes the spectral features of the signals with the ignorance of the spatial discriminability of the multi-channel processing. In this paper, we tightly integrate spectral and spatial information for target speaker extraction. In the proposed scheme, a multi-channel mixture signal is firstly filtered into a set of beamformed signals using fixed beam patterns. An attention network is then designed to identify the direction of the target speaker and to combine the beamformed signals into an enhanced signal dominated by the target speaker energy. Further, SpeakerBeam inputs the enhanced signal and outputs the mask of the target speaker. Finally, the attention network and SpeakerBeam are jointly trained. Experimental results demonstrate that the proposed scheme largely improves the existing multi-channel SpeakerBeam in low signal-to-interference ratio or same-gender scenarios.

KW - Fixed beamforming

KW - Jointly training

KW - Multi-channel signal processing

KW - Speaker extraction

UR - http://www.scopus.com/inward/record.url?scp=85074732125&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2019-1474

DO - 10.21437/Interspeech.2019-1474

M3 - Conference Proceeding

AN - SCOPUS:85074732125

VL - 2019-September

T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SP - 2713

EP - 2717

BT - 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019

T2 - 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019

Y2 - 15 September 2019 through 19 September 2019

ER -

Li G, Liang S, Nie S, Liu W, Yu M, Chen L et al. Direction-Aware Speaker Beam for Multi-Channel Speaker Extraction. In 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019. Vol. 2019-September. 2019. p. 2713-2717. (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH). doi: 10.21437/Interspeech.2019-1474

Direction-Aware Speaker Beam for Multi-Channel Speaker Extraction

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this