Skip to main navigation Skip to search Skip to main content

Learnable MFCCs for speaker verification

  • University of Eastern Finland
  • Université de Lorraine

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

16 Citations (Scopus)

Abstract

We propose a learnable mel-frequency cepstral coefficients (MFCCs) front-end architecture for deep neural network (DNN) based automatic speaker verification. Our architecture retains the simplicity and interpretability of MFCC-based features while allowing the model to be adapted to data flexibly. In practice, we formulate data-driven version of four linear transforms in a standard MFCC extractor - windowing, discrete Fourier transform (DFT), mel filterbank and discrete cosine transform (DCT). Results reported reach up to 6.7% (VoxCeleb1) and 9.7% (SITW) relative improvement in term of equal error rate (EER) from static MFCCs, without additional tuning effort.

Original languageEnglish
Title of host publication2021 IEEE International Symposium on Circuits and Systems, ISCAS 2021 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728192017
DOIs
Publication statusPublished - 2021
Event53rd IEEE International Symposium on Circuits and Systems, ISCAS 2021 - Daegu, Korea, Republic of
Duration: 22 May 202128 May 2021

Publication series

NameProceedings - IEEE International Symposium on Circuits and Systems
Volume2021-May
ISSN (Print)0271-4310

Conference

Conference53rd IEEE International Symposium on Circuits and Systems, ISCAS 2021
Country/TerritoryKorea, Republic of
CityDaegu
Period22/05/2128/05/21

Keywords

  • Feature extraction
  • Mel-frequency cesptral coefficients (MFCCs)
  • Speaker verification

Cite this