Speaker Recognition-Assisted Robust Audio Deepfake Detection

Jiahui Pan, Shuai Nie, Hui Zhang, Shulin He, Kanghao Zhang, Shan Liang, Xueliang Zhang, Jianhua Tao

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

4 Citations (Scopus)

Abstract

Audio deepfake detection is usually formulated as a binary classification between genuine and fake speech for an entire utterance. Environmental clues such as background and device noise can be used as the classification features, but they are easy to be attacked, e.g. by simply adding real noise to the fake speech. While speech spectral discrimination are more robust features, which have been used in speaker recognition models to authenticate the speaker identity. In the study, we propose a speaker recognition-assisted audio deepfake detector. Feature representation extracted by a speaker recognition model is introduced into multiple layers of deepfake detector to fully exploit the inherent spectral discrimination of speech. Speaker recognition and audio deepfake detection models are jointly optimized by a multi-objective learning method. Systematic experiments on the ASVspoof 2019 logical access corpus demonstrate the proposed approach outperforms existing single systems and significantly improves the robustness to noise.

Original languageEnglish
Title of host publication23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022
Pages4202-4206
Number of pages5
Volume2022-September
DOIs
Publication statusPublished - 2022
Externally publishedYes
Event23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of
Duration: 18 Sept 202222 Sept 2022

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISSN (Print)2308-457X

Conference

Conference23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022
Country/TerritoryKorea, Republic of
CityIncheon
Period18/09/2222/09/22

Keywords

  • ASVspoof 2019
  • audio deepfake detection
  • speaker recognition-assisted
  • spectral discrimination

Fingerprint

Dive into the research topics of 'Speaker Recognition-Assisted Robust Audio Deepfake Detection'. Together they form a unique fingerprint.

Cite this