TY - JOUR
T1 - ASVspoof 5
T2 - Design, collection and validation of resources for spoofing, deepfake, and adversarial attack detection using crowdsourced speech
AU - Wang, Xin
AU - Delgado, Héctor
AU - Tak, Hemlata
AU - Jung, Jee weon
AU - Shim, Hye jin
AU - Todisco, Massimiliano
AU - Kukanov, Ivan
AU - Liu, Xuechen
AU - Sahidullah, Md
AU - Kinnunen, Tomi
AU - Evans, Nicholas
AU - Lee, Kong Aik
AU - Yamagishi, Junichi
AU - Jeong, Myeonghun
AU - Zhu, Ge
AU - Zang, Yongyi
AU - Zhang, You
AU - Maiti, Soumi
AU - Lux, Florian
AU - Müller, Nicolas
AU - Zhang, Wangyou
AU - Sun, Chengzhe
AU - Hou, Shuwei
AU - Lyu, Siwei
AU - Le Maguer, Sébastien
AU - Gong, Cheng
AU - Guo, Hanjie
AU - Chen, Liping
AU - Singh, Vishwanath
N1 - Publisher Copyright:
© 2025 The Authors
PY - 2026/1
Y1 - 2026/1
N2 - ASVspoof 5 is the fifth edition in a series of challenges which promote the study of speech spoofing and deepfake attacks as well as the design of detection solutions. We introduce the ASVspoof 5 database which is generated in a crowdsourced fashion from data collected in diverse acoustic conditions (cf. studio-quality data for earlier ASVspoof databases) and from ∼2000 speakers (cf. ∼100 earlier). The database contains attacks generated with 32 different algorithms, also crowdsourced, and optimised to varying degrees using new surrogate detection models. Among them are attacks generated with a mix of legacy and contemporary text-to-speech synthesis and voice conversion models, in addition to adversarial attacks which are incorporated for the first time. ASVspoof 5 protocols comprise seven speaker-disjoint partitions. They include two distinct partitions for the training of different sets of attack models, two more for the development and evaluation of surrogate detection models, and then three additional partitions which comprise the ASVspoof 5 training, development and evaluation sets. An auxiliary set of data collected from an additional 30k speakers can also be used to train speaker encoders for the implementation of attack algorithms. Also described herein is an experimental validation of the new ASVspoof 5 database using a set of automatic speaker verification and spoof/deepfake baseline detectors. With the exception of protocols and tools for the generation of spoofed/deepfake speech, the resources described in this paper, already used by participants of the ASVspoof 5 challenge in 2024, are now all freely available to the community.
AB - ASVspoof 5 is the fifth edition in a series of challenges which promote the study of speech spoofing and deepfake attacks as well as the design of detection solutions. We introduce the ASVspoof 5 database which is generated in a crowdsourced fashion from data collected in diverse acoustic conditions (cf. studio-quality data for earlier ASVspoof databases) and from ∼2000 speakers (cf. ∼100 earlier). The database contains attacks generated with 32 different algorithms, also crowdsourced, and optimised to varying degrees using new surrogate detection models. Among them are attacks generated with a mix of legacy and contemporary text-to-speech synthesis and voice conversion models, in addition to adversarial attacks which are incorporated for the first time. ASVspoof 5 protocols comprise seven speaker-disjoint partitions. They include two distinct partitions for the training of different sets of attack models, two more for the development and evaluation of surrogate detection models, and then three additional partitions which comprise the ASVspoof 5 training, development and evaluation sets. An auxiliary set of data collected from an additional 30k speakers can also be used to train speaker encoders for the implementation of attack algorithms. Also described herein is an experimental validation of the new ASVspoof 5 database using a set of automatic speaker verification and spoof/deepfake baseline detectors. With the exception of protocols and tools for the generation of spoofed/deepfake speech, the resources described in this paper, already used by participants of the ASVspoof 5 challenge in 2024, are now all freely available to the community.
KW - ASVspoof
KW - Corpus design
KW - Countermeasures
KW - Deepfakes
KW - Presentation attack detection
KW - Spoofing
UR - https://www.scopus.com/pages/publications/105007094678
U2 - 10.1016/j.csl.2025.101825
DO - 10.1016/j.csl.2025.101825
M3 - Article
AN - SCOPUS:105007094678
SN - 0885-2308
VL - 95
JO - Computer Speech and Language
JF - Computer Speech and Language
M1 - 101825
ER -