TY - JOUR
T1 - Deep Normalization for Speaker Vectors
AU - Cai, Yunqi
AU - Li, Lantian
AU - Abel, Andrew
AU - Zhu, Xiaoyan
AU - Wang, Dong
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2021
Y1 - 2021
N2 - Deep speaker embedding has demonstrated state-of-the-art performance in speaker recognition tasks. However, one potential issue with this approach is that the speaker vectors derived from deep embedding models tend to be non-Gaussian for each individual speaker, and non-homogeneous for distributions of different speakers. These irregular distributions can seriously impact speaker recognition performance, especially with the popular PLDA scoring method, which assumes homogeneous Gaussian distribution. In this article, we argue that deep speaker vectors require deep normalization, and propose a deep normalization approach based on a novel discriminative normalization flow (DNF) model. We demonstrate the effectiveness of the proposed approach with experiments using the widely used SITW and CNCeleb corpora. In these experiments, the DNF-based normalization delivered substantial performance gains and also showed strong generalization capability in out-of-domain tests.
AB - Deep speaker embedding has demonstrated state-of-the-art performance in speaker recognition tasks. However, one potential issue with this approach is that the speaker vectors derived from deep embedding models tend to be non-Gaussian for each individual speaker, and non-homogeneous for distributions of different speakers. These irregular distributions can seriously impact speaker recognition performance, especially with the popular PLDA scoring method, which assumes homogeneous Gaussian distribution. In this article, we argue that deep speaker vectors require deep normalization, and propose a deep normalization approach based on a novel discriminative normalization flow (DNF) model. We demonstrate the effectiveness of the proposed approach with experiments using the widely used SITW and CNCeleb corpora. In these experiments, the DNF-based normalization delivered substantial performance gains and also showed strong generalization capability in out-of-domain tests.
KW - Normalization flow
KW - speaker embedding
KW - speaker recognition
UR - http://www.scopus.com/inward/record.url?scp=85098762552&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2020.3039573
DO - 10.1109/TASLP.2020.3039573
M3 - Article
AN - SCOPUS:85098762552
SN - 2329-9290
VL - 29
SP - 733
EP - 744
JO - IEEE/ACM Transactions on Audio Speech and Language Processing
JF - IEEE/ACM Transactions on Audio Speech and Language Processing
M1 - 9296778
ER -