TY - GEN
T1 - A comparative Re-assessment of feature extractors for deep speaker embeddings
AU - Liu, Xuechen
AU - Sahidullah, Md
AU - Kinnunen, Tomi
N1 - Publisher Copyright:
© 2020 International Speech Communication Association. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Modern automatic speaker verification relies largely on deep neural networks (DNNs) trained on mel-frequency cepstral coefficient (MFCC) features. While there are alternative feature extraction methods based on phase, prosody and long-term temporal operations, they have not been extensively studied with DNN-based methods. We aim to fill this gap by providing extensive re-assessment of 14 feature extractors on VoxCeleb and SITW datasets. Our findings reveal that features equipped with techniques such as spectral centroids, group delay function, and integrated noise suppression provide promising alternatives to MFCCs for deep speaker embeddings extraction. Experimental results demonstrate up to 16.3% (VoxCeleb) and 25.1% (SITW) relative decrease in equal error rate (EER) to the baseline.
AB - Modern automatic speaker verification relies largely on deep neural networks (DNNs) trained on mel-frequency cepstral coefficient (MFCC) features. While there are alternative feature extraction methods based on phase, prosody and long-term temporal operations, they have not been extensively studied with DNN-based methods. We aim to fill this gap by providing extensive re-assessment of 14 feature extractors on VoxCeleb and SITW datasets. Our findings reveal that features equipped with techniques such as spectral centroids, group delay function, and integrated noise suppression provide promising alternatives to MFCCs for deep speaker embeddings extraction. Experimental results demonstrate up to 16.3% (VoxCeleb) and 25.1% (SITW) relative decrease in equal error rate (EER) to the baseline.
KW - Deep speaker embeddings
KW - Feature extraction
KW - Speaker verification
UR - https://www.scopus.com/pages/publications/85098234248
U2 - 10.21437/Interspeech.2020-1765
DO - 10.21437/Interspeech.2020-1765
M3 - Conference Proceeding
AN - SCOPUS:85098234248
SN - 9781713820697
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 3221
EP - 3225
BT - Interspeech 2020
PB - International Speech Communication Association
T2 - 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
Y2 - 25 October 2020 through 29 October 2020
ER -