Skip to main navigation Skip to search Skip to main content

Teacher-Student Training for Text-Independent Speaker Recognition

  • Emotech Labs
  • University of New South Wales

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

8 Citations (Scopus)

Abstract

This paper investigates text-independent speaker recognition using neural embedding extractors based on the time-delay neural network. Our primary focus is to explore the teacher-student (TS) training framework for knowledge distillation in a text-independent (TI) speaker recognition task. We report the results on both proprietary and public benchmarks, obtaining competitive results with 88-93% smaller models. Particularly, in clean testing conditions, we find TS training on neural-based TI systems achieved same or better performance than the i-vector based counterparts. Neural embeddings are less prone to short segment issues, and offer better performance particularly in the high-recall setting. They can also provide some additional insights about speakers, such as gender or how difficult a given speaker can be for recognition.

Original languageEnglish
Title of host publication2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1044-1051
Number of pages8
ISBN (Electronic)9781538643341
DOIs
Publication statusPublished - 2 Jul 2018
Externally publishedYes
Event2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Athens, Greece
Duration: 18 Dec 201821 Dec 2018

Publication series

Name2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Proceedings

Conference

Conference2018 IEEE Spoken Language Technology Workshop, SLT 2018
Country/TerritoryGreece
CityAthens
Period18/12/1821/12/18

Keywords

  • Knowledge Distillation
  • Speaker Recognition
  • Teacher-Student training

Cite this