Frequency-dependent auto-pooling function for weakly supervised sound event detection

Sichen Liu, Feiran Yang*, Yin Cao, Jun Yang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)


Sound event detection (SED), which is typically treated as a supervised problem, aims at detecting types of sound events and corresponding temporal information. It requires to estimate onset and offset annotations for sound events at each frame. Many available sound event datasets only contain audio tags without precise temporal information. This type of dataset is therefore classified as weakly labeled dataset. In this paper, we propose a novel source separation-based method trained on weakly labeled data to solve SED problems. We build a dilated depthwise separable convolution block (DDC-block) to estimate time-frequency (T-F) masks of each sound event from a T-F representation of an audio clip. DDC-block is experimentally proven to be more effective and computationally lighter than “VGG-like” block. To fully utilize frequency characteristics of sound events, we then propose a frequency-dependent auto-pooling (FAP) function to obtain the clip-level present probability of each sound event class. A combination of two schemes, named DDC-FAP method, is evaluated on DCASE 2018 Task 2, DCASE 2020 Task4, and DCASE 2017 Task 4 datasets. The results show that DDC-FAP has a better performance than the state-of-the-art source separation-based method in SED task.

Original languageEnglish
Article number19
JournalEurasip Journal on Audio, Speech, and Music Processing
Issue number1
Publication statusPublished - Dec 2021
Externally publishedYes


  • Auto-pooling function
  • Depthwise separable convolution
  • Sound event detection
  • Weakly supervised


Dive into the research topics of 'Frequency-dependent auto-pooling function for weakly supervised sound event detection'. Together they form a unique fingerprint.

Cite this