Enhancing Open-Set Speaker Identification Through Rapid Tuning With Speaker Reciprocal Points and Negative Sample

Zhiyong Chen*, Zhiqi Ai, Xinnuo Li, Shugong Xu

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

Abstract

This paper introduces a novel framework for open-set speaker identification in household environments, playing a crucial role in facilitating seamless human-computer interactions. Addressing the limitations of current speaker models and classification approaches, our work integrates an pretrained WavLM frontend with a few-shot rapid tuning neural network (NN) backend for enrollment, employing task-optimized Speaker Reciprocal Points Learning (SRPL) to enhance discrimination across multiple target speakers. Furthermore, we propose an enhanced version of SRPL (SRPL+), which incorporates negative sample learning with both speech-synthesized and real negative samples to significantly improve open-set SID accuracy. Our approach is thoroughly evaluated across various multi-language textdependent speaker recognition datasets, demonstrating its effectiveness in achieving high usability for complex household multi-speaker recognition scenarios. The proposed system enhanced open-set performance by up to 27% over the directly use of efficient WavLM base+ model. For detailed information on open-sourced implementation in our project website1,.

Original languageEnglish
Title of host publicationProceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1144-1149
Number of pages6
ISBN (Electronic)9798350392258
DOIs
Publication statusPublished - 2024
Externally publishedYes
Event2024 IEEE Spoken Language Technology Workshop, SLT 2024 - Macao, China
Duration: 2 Dec 20245 Dec 2024

Publication series

NameProceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024

Conference

Conference2024 IEEE Spoken Language Technology Workshop, SLT 2024
Country/TerritoryChina
CityMacao
Period2/12/245/12/24

Keywords

  • few-shot learning
  • open-set learning
  • Speaker identification
  • speaker recognition
  • speech synthesis

Fingerprint

Dive into the research topics of 'Enhancing Open-Set Speaker Identification Through Rapid Tuning With Speaker Reciprocal Points and Negative Sample'. Together they form a unique fingerprint.

Cite this