TY - GEN
T1 - Parameterized Channel Normalization for Far-Field Deep Speaker Verification
AU - Liu, Xuechen
AU - Sahidullah, Md
AU - Kinnunen, Tomi
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - We address far-field speaker verification with deep neural network (DNN) based speaker embedding extractor, where mismatch between enrollment and test data often comes from convolutive effects (e.g. room reverberation) and noise. To mitigate these effects, we focus on two parametric normalization methods: per-channel energy normalization (PCEN) and parameterized cepstral mean normalization (PCMN). Both methods contain differentiable parameters and thus can be conveniently integrated to, and jointly optimized with the DNN using automatic differentiation methods. We consider both fixed and trainable (data-driven) variants of each method. We evaluate the performance on Hi-MIA, a recent large-scale far-field speech corpus, with varied microphone and positional settings. Our methods outperform conventional mel filterbank features, with maximum of 33.5% and 39.5% relative improvement on equal error rate under matched microphone and mismatched microphone conditions, respectively.
AB - We address far-field speaker verification with deep neural network (DNN) based speaker embedding extractor, where mismatch between enrollment and test data often comes from convolutive effects (e.g. room reverberation) and noise. To mitigate these effects, we focus on two parametric normalization methods: per-channel energy normalization (PCEN) and parameterized cepstral mean normalization (PCMN). Both methods contain differentiable parameters and thus can be conveniently integrated to, and jointly optimized with the DNN using automatic differentiation methods. We consider both fixed and trainable (data-driven) variants of each method. We evaluate the performance on Hi-MIA, a recent large-scale far-field speech corpus, with varied microphone and positional settings. Our methods outperform conventional mel filterbank features, with maximum of 33.5% and 39.5% relative improvement on equal error rate under matched microphone and mismatched microphone conditions, respectively.
KW - acoustic feature extractor
KW - channel normalization
KW - far-field speaker verification
KW - spectrogram
UR - https://www.scopus.com/pages/publications/85125033327
U2 - 10.1109/ASRU51503.2021.9688142
DO - 10.1109/ASRU51503.2021.9688142
M3 - Conference Proceeding
AN - SCOPUS:85125033327
T3 - 2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021 - Proceedings
SP - 1132
EP - 1138
BT - 2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021
Y2 - 13 December 2021 through 17 December 2021
ER -