TY - JOUR
T1 - ASVspoof 2021
T2 - Towards Spoofed and Deepfake Speech Detection in the Wild
AU - Liu, Xuechen
AU - Wang, Xin
AU - Sahidullah, Md
AU - Patino, Jose
AU - Delgado, Hector
AU - Kinnunen, Tomi
AU - Todisco, Massimiliano
AU - Yamagishi, Junichi
AU - Evans, Nicholas
AU - Nautsch, Andreas
AU - Lee, Kong Aik
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2023
Y1 - 2023
N2 - Benchmarking initiatives support the meaningful comparison of competing solutions to prominent problems in speech and language processing. Successive benchmarking evaluations typically reflect a progressive evolution from ideal lab conditions towards to those encountered in the wild. ASVspoof, the spoofing and deepfake detection initiative and challenge series, has followed the same trend. This article provides a summary of the ASVspoof 2021 challenge and the results of 54 participating teams that submitted to the evaluation phase. For the logical access (LA) task, results indicate that countermeasures are robust to newly introduced encoding and transmission effects. Results for the physical access (PA) task indicate the potential to detect replay attacks in real, as opposed to simulated physical spaces, but a lack of robustness to variations between simulated and real acoustic environments. The Deepfake (DF) task, new to the 2021 edition, targets solutions to the detection of manipulated, compressed speech data posted online. While detection solutions offer some resilience to compression effects, they lack generalization across different source datasets. In addition to a summary of the top-performing systems for each task, new analyses of influential data factors and results for hidden data subsets, the article includes a review of post-challenge results, an outline of the principal challenge limitations and a road-map for the future of ASVspoof.
AB - Benchmarking initiatives support the meaningful comparison of competing solutions to prominent problems in speech and language processing. Successive benchmarking evaluations typically reflect a progressive evolution from ideal lab conditions towards to those encountered in the wild. ASVspoof, the spoofing and deepfake detection initiative and challenge series, has followed the same trend. This article provides a summary of the ASVspoof 2021 challenge and the results of 54 participating teams that submitted to the evaluation phase. For the logical access (LA) task, results indicate that countermeasures are robust to newly introduced encoding and transmission effects. Results for the physical access (PA) task indicate the potential to detect replay attacks in real, as opposed to simulated physical spaces, but a lack of robustness to variations between simulated and real acoustic environments. The Deepfake (DF) task, new to the 2021 edition, targets solutions to the detection of manipulated, compressed speech data posted online. While detection solutions offer some resilience to compression effects, they lack generalization across different source datasets. In addition to a summary of the top-performing systems for each task, new analyses of influential data factors and results for hidden data subsets, the article includes a review of post-challenge results, an outline of the principal challenge limitations and a road-map for the future of ASVspoof.
KW - ASVspoof
KW - countermeasures
KW - deepfakes
KW - presentation attack detection
KW - spoofing
UR - https://www.scopus.com/pages/publications/85162920923
U2 - 10.1109/TASLP.2023.3285283
DO - 10.1109/TASLP.2023.3285283
M3 - Article
AN - SCOPUS:85162920923
SN - 2329-9290
VL - 31
SP - 2507
EP - 2522
JO - IEEE/ACM Transactions on Audio Speech and Language Processing
JF - IEEE/ACM Transactions on Audio Speech and Language Processing
ER -