TY - GEN
T1 - Text-Queried Target Sound Event Localization
AU - Zhao, Jinzheng
AU - Qian, Xinyuan
AU - Xu, Yong
AU - Liu, Haohe
AU - Cao, Yin
AU - Berghi, Davide
AU - Wang, Wenwu
N1 - Publisher Copyright:
© 2024 European Signal Processing Conference, EUSIPCO. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Sound event localization and detection (SELD) aims to determine the appearance of sound classes, together with their Direction of Arrival (DOA). However, current SELD systems can only predict the activities of specific classes, for example, 13 classes in DCASE challenges. In this paper, we propose text-queried target sound event localization (SEL), a new paradigm that allows the user to input the text to describe the sound event, and the SEL model can predict the location of the related sound event. The proposed task presents a more user-friendly way for human-computer interaction. We provide a benchmark study for the proposed task and perform experiments on datasets created by simulated room impulse response (RIR) and real RIR to validate the effectiveness of the proposed methods. We hope that our benchmark will inspire the interest and additional research for text-queried sound source localization.
AB - Sound event localization and detection (SELD) aims to determine the appearance of sound classes, together with their Direction of Arrival (DOA). However, current SELD systems can only predict the activities of specific classes, for example, 13 classes in DCASE challenges. In this paper, we propose text-queried target sound event localization (SEL), a new paradigm that allows the user to input the text to describe the sound event, and the SEL model can predict the location of the related sound event. The proposed task presents a more user-friendly way for human-computer interaction. We provide a benchmark study for the proposed task and perform experiments on datasets created by simulated room impulse response (RIR) and real RIR to validate the effectiveness of the proposed methods. We hope that our benchmark will inspire the interest and additional research for text-queried sound source localization.
KW - multimodal fusion
KW - sound event localization and detection
UR - http://www.scopus.com/inward/record.url?scp=85208424553&partnerID=8YFLogxK
U2 - 10.23919/eusipco63174.2024.10715199
DO - 10.23919/eusipco63174.2024.10715199
M3 - Conference Proceeding
AN - SCOPUS:85208424553
T3 - European Signal Processing Conference
SP - 261
EP - 265
BT - 32nd European Signal Processing Conference, EUSIPCO 2024 - Proceedings
PB - European Signal Processing Conference, EUSIPCO
T2 - 32nd European Signal Processing Conference, EUSIPCO 2024
Y2 - 26 August 2024 through 30 August 2024
ER -