Enhancing Open-Set Speaker Identification Through Rapid Tuning With Speaker Reciprocal Points and Negative Sample

Zhiyong Chen; Zhiqi Ai; Xinnuo Li; Shugong Xu

doi:10.1109/SLT61566.2024.10832359

Enhancing Open-Set Speaker Identification Through Rapid Tuning With Speaker Reciprocal Points and Negative Sample

Zhiyong Chen^*, Zhiqi Ai, Xinnuo Li, Shugong Xu

^*Corresponding author for this work

Shanghai University

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

Abstract

This paper introduces a novel framework for open-set speaker identification in household environments, playing a crucial role in facilitating seamless human-computer interactions. Addressing the limitations of current speaker models and classification approaches, our work integrates an pretrained WavLM frontend with a few-shot rapid tuning neural network (NN) backend for enrollment, employing task-optimized Speaker Reciprocal Points Learning (SRPL) to enhance discrimination across multiple target speakers. Furthermore, we propose an enhanced version of SRPL (SRPL+), which incorporates negative sample learning with both speech-synthesized and real negative samples to significantly improve open-set SID accuracy. Our approach is thoroughly evaluated across various multi-language textdependent speaker recognition datasets, demonstrating its effectiveness in achieving high usability for complex household multi-speaker recognition scenarios. The proposed system enhanced open-set performance by up to 27% over the directly use of efficient WavLM base+ model. For detailed information on open-sourced implementation in our project website¹,.

Original language	English
Title of host publication	Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	1144-1149
Number of pages	6
ISBN (Electronic)	9798350392258
DOIs	https://doi.org/10.1109/SLT61566.2024.10832359
Publication status	Published - 2024
Externally published	Yes
Event	2024 IEEE Spoken Language Technology Workshop, SLT 2024 - Macao, China Duration: 2 Dec 2024 → 5 Dec 2024

Publication series

Name	Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024

Conference

Conference	2024 IEEE Spoken Language Technology Workshop, SLT 2024
Country/Territory	China
City	Macao
Period	2/12/24 → 5/12/24

Keywords

few-shot learning
open-set learning
Speaker identification
speaker recognition
speech synthesis

Access to Document

10.1109/SLT61566.2024.10832359

Cite this

Chen, Z., Ai, Z., Li, X., & Xu, S. (2024). Enhancing Open-Set Speaker Identification Through Rapid Tuning With Speaker Reciprocal Points and Negative Sample. In Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024 (pp. 1144-1149). (Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/SLT61566.2024.10832359

Chen, Zhiyong ; Ai, Zhiqi ; Li, Xinnuo et al. / Enhancing Open-Set Speaker Identification Through Rapid Tuning With Speaker Reciprocal Points and Negative Sample. Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024. Institute of Electrical and Electronics Engineers Inc., 2024. pp. 1144-1149 (Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024).

@inproceedings{c0e30dd6ba6d4ec0b62b77673ea4d7e1,

title = "Enhancing Open-Set Speaker Identification Through Rapid Tuning With Speaker Reciprocal Points and Negative Sample",

abstract = "This paper introduces a novel framework for open-set speaker identification in household environments, playing a crucial role in facilitating seamless human-computer interactions. Addressing the limitations of current speaker models and classification approaches, our work integrates an pretrained WavLM frontend with a few-shot rapid tuning neural network (NN) backend for enrollment, employing task-optimized Speaker Reciprocal Points Learning (SRPL) to enhance discrimination across multiple target speakers. Furthermore, we propose an enhanced version of SRPL (SRPL+), which incorporates negative sample learning with both speech-synthesized and real negative samples to significantly improve open-set SID accuracy. Our approach is thoroughly evaluated across various multi-language textdependent speaker recognition datasets, demonstrating its effectiveness in achieving high usability for complex household multi-speaker recognition scenarios. The proposed system enhanced open-set performance by up to 27% over the directly use of efficient WavLM base+ model. For detailed information on open-sourced implementation in our project website1,.",

keywords = "few-shot learning, open-set learning, Speaker identification, speaker recognition, speech synthesis",

author = "Zhiyong Chen and Zhiqi Ai and Xinnuo Li and Shugong Xu",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 2024 IEEE Spoken Language Technology Workshop, SLT 2024 ; Conference date: 02-12-2024 Through 05-12-2024",

year = "2024",

doi = "10.1109/SLT61566.2024.10832359",

language = "English",

series = "Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "1144--1149",

booktitle = "Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024",

}

Chen, Z, Ai, Z, Li, X & Xu, S 2024, Enhancing Open-Set Speaker Identification Through Rapid Tuning With Speaker Reciprocal Points and Negative Sample. in Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024. Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024, Institute of Electrical and Electronics Engineers Inc., pp. 1144-1149, 2024 IEEE Spoken Language Technology Workshop, SLT 2024, Macao, China, 2/12/24. https://doi.org/10.1109/SLT61566.2024.10832359

Enhancing Open-Set Speaker Identification Through Rapid Tuning With Speaker Reciprocal Points and Negative Sample. / Chen, Zhiyong; Ai, Zhiqi; Li, Xinnuo et al.
Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024. Institute of Electrical and Electronics Engineers Inc., 2024. p. 1144-1149 (Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Enhancing Open-Set Speaker Identification Through Rapid Tuning With Speaker Reciprocal Points and Negative Sample

AU - Chen, Zhiyong

AU - Ai, Zhiqi

AU - Li, Xinnuo

AU - Xu, Shugong

PY - 2024

Y1 - 2024

N2 - This paper introduces a novel framework for open-set speaker identification in household environments, playing a crucial role in facilitating seamless human-computer interactions. Addressing the limitations of current speaker models and classification approaches, our work integrates an pretrained WavLM frontend with a few-shot rapid tuning neural network (NN) backend for enrollment, employing task-optimized Speaker Reciprocal Points Learning (SRPL) to enhance discrimination across multiple target speakers. Furthermore, we propose an enhanced version of SRPL (SRPL+), which incorporates negative sample learning with both speech-synthesized and real negative samples to significantly improve open-set SID accuracy. Our approach is thoroughly evaluated across various multi-language textdependent speaker recognition datasets, demonstrating its effectiveness in achieving high usability for complex household multi-speaker recognition scenarios. The proposed system enhanced open-set performance by up to 27% over the directly use of efficient WavLM base+ model. For detailed information on open-sourced implementation in our project website1,.

AB - This paper introduces a novel framework for open-set speaker identification in household environments, playing a crucial role in facilitating seamless human-computer interactions. Addressing the limitations of current speaker models and classification approaches, our work integrates an pretrained WavLM frontend with a few-shot rapid tuning neural network (NN) backend for enrollment, employing task-optimized Speaker Reciprocal Points Learning (SRPL) to enhance discrimination across multiple target speakers. Furthermore, we propose an enhanced version of SRPL (SRPL+), which incorporates negative sample learning with both speech-synthesized and real negative samples to significantly improve open-set SID accuracy. Our approach is thoroughly evaluated across various multi-language textdependent speaker recognition datasets, demonstrating its effectiveness in achieving high usability for complex household multi-speaker recognition scenarios. The proposed system enhanced open-set performance by up to 27% over the directly use of efficient WavLM base+ model. For detailed information on open-sourced implementation in our project website1,.

KW - few-shot learning

KW - open-set learning

KW - Speaker identification

KW - speaker recognition

KW - speech synthesis

UR - http://www.scopus.com/inward/record.url?scp=85217432657&partnerID=8YFLogxK

U2 - 10.1109/SLT61566.2024.10832359

DO - 10.1109/SLT61566.2024.10832359

M3 - Conference Proceeding

AN - SCOPUS:85217432657

T3 - Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024

SP - 1144

EP - 1149

BT - Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2024 IEEE Spoken Language Technology Workshop, SLT 2024

Y2 - 2 December 2024 through 5 December 2024

ER -

Chen Z, Ai Z, Li X, Xu S. Enhancing Open-Set Speaker Identification Through Rapid Tuning With Speaker Reciprocal Points and Negative Sample. In Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024. Institute of Electrical and Electronics Engineers Inc. 2024. p. 1144-1149. (Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024). doi: 10.1109/SLT61566.2024.10832359

Enhancing Open-Set Speaker Identification Through Rapid Tuning With Speaker Reciprocal Points and Negative Sample

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this