TY - JOUR
T1 - Towards single integrated spoofing-aware speaker verification embeddings
AU - Mun, Sung Hwan
AU - Shim, Hye Jin
AU - Tak, Hemlata
AU - Wang, Xin
AU - Liu, Xuechen
AU - Sahidullah, Md
AU - Jeong, Myeonghun
AU - Han, Min Hyun
AU - Todisco, Massimiliano
AU - Lee, Kong Aik
AU - Yamagishi, Junichi
AU - Evans, Nicholas
AU - Kinnunen, Tomi
AU - Kim, Nam Soo
AU - Jung, Jee Weon
N1 - Publisher Copyright:
© 2023 International Speech Communication Association. All rights reserved.
PY - 2023
Y1 - 2023
N2 - This study aims to develop a single integrated spoofing-aware speaker verification (SASV) embeddings that satisfy two aspects. First, rejecting non-target speakers' input as well as target speakers' spoofed inputs should be addressed. Second, competitive performance should be demonstrated compared to the fusion of automatic speaker verification (ASV) and countermeasure (CM) embeddings, which outperformed single embedding solutions by a large margin in the SASV2022 challenge. We analyze that the inferior performance of single SASV embeddings comes from insufficient amount of training data and distinct nature of ASV and CM tasks. To this end, we propose a novel framework that includes multi-stage training and a combination of loss functions. Copy synthesis, combined with several vocoders, is also exploited to address the lack of spoofed data. Experimental results show dramatic improvements, achieving an SASV-EER of 1.06% on the evaluation protocol of the SASV2022 challenge.
AB - This study aims to develop a single integrated spoofing-aware speaker verification (SASV) embeddings that satisfy two aspects. First, rejecting non-target speakers' input as well as target speakers' spoofed inputs should be addressed. Second, competitive performance should be demonstrated compared to the fusion of automatic speaker verification (ASV) and countermeasure (CM) embeddings, which outperformed single embedding solutions by a large margin in the SASV2022 challenge. We analyze that the inferior performance of single SASV embeddings comes from insufficient amount of training data and distinct nature of ASV and CM tasks. To this end, we propose a novel framework that includes multi-stage training and a combination of loss functions. Copy synthesis, combined with several vocoders, is also exploited to address the lack of spoofed data. Experimental results show dramatic improvements, achieving an SASV-EER of 1.06% on the evaluation protocol of the SASV2022 challenge.
KW - anti-spoofing
KW - speaker verification
KW - spoofing-aware speaker verification
UR - https://www.scopus.com/pages/publications/85171579525
U2 - 10.21437/Interspeech.2023-1402
DO - 10.21437/Interspeech.2023-1402
M3 - Conference article
AN - SCOPUS:85171579525
SN - 2308-457X
VL - 2023-August
SP - 3989
EP - 3993
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 24th Annual conference of the International Speech Communication Association, Interspeech 2023
Y2 - 20 August 2023 through 24 August 2023
ER -