Exploiting Large-Scale Teacher-Student Training for On-Device Acoustic Models

Jing Liu*, Rupak Vignesh Swaminathan, Sree Hari Krishnan Parthasarathi, Chunchuan Lyu, Athanasios Mouchtaris, Siegfried Kunzmann

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

4 Citations (Scopus)

Abstract

We present results from Alexa speech teams on semi-supervised learning (SSL) of acoustic models (AM) with experiments spanning over 3000 h of GPU time, making our study one of the largest of its kind. We discuss SSL for AMs in a small footprint setting, showing that a smaller capacity model trained with 1 million hours of unsupervised data can outperform a baseline supervised system by 14.3% word error rate reduction (WERR). When increasing the supervised data to seven-fold, our gains diminish to 7.1% WERR; to improve SSL efficiency at larger supervised data regimes, we employ a step-wise distillation into a smaller model, obtaining a WERR of 14.4%. We then switch to SSL using larger student models in low data regimes; while learning efficiency with unsupervised data is higher, student models may outperform teacher models in such a setting. We develop a theoretical sketch to explain this behavior.

Original languageEnglish
Title of host publicationText, Speech, and Dialogue - 24th International Conference, TSD 2021, Proceedings
EditorsKamil Ekštein, František Pártl, Miloslav Konopík
PublisherSpringer Science and Business Media Deutschland GmbH
Pages413-424
Number of pages12
ISBN (Print)9783030835262
DOIs
Publication statusPublished - 2021
Externally publishedYes
Event24th International Conference on Text, Speech, and Dialogue, TSD 2021 - Olomouc, Czech Republic
Duration: 6 Sept 20219 Sept 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12848 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference24th International Conference on Text, Speech, and Dialogue, TSD 2021
Country/TerritoryCzech Republic
CityOlomouc
Period6/09/219/09/21

Keywords

  • Acoustic models
  • Edge computing
  • Semi-supervised learning
  • Speech recognition
  • Student-teacher learning

Fingerprint

Dive into the research topics of 'Exploiting Large-Scale Teacher-Student Training for On-Device Acoustic Models'. Together they form a unique fingerprint.

Cite this