A TRACK-WISE ENSEMBLE EVENT INDEPENDENT NETWORK FOR POLYPHONIC SOUND EVENT LOCALIZATION AND DETECTION

Jinbo Hu; Yin Cao; Ming Wu; Qiuqiang Kong; Feiran Yang; Mark D. Plumbley; Jun Yang

doi:10.1109/ICASSP43922.2022.9747283

A TRACK-WISE ENSEMBLE EVENT INDEPENDENT NETWORK FOR POLYPHONIC SOUND EVENT LOCALIZATION AND DETECTION

Jinbo Hu, Yin Cao, Ming Wu, Qiuqiang Kong, Feiran Yang, Mark D. Plumbley, Jun Yang

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

13 Citations (Scopus)

Abstract

Polyphonic sound event localization and detection (SELD) aims at detecting types of sound events with corresponding temporal activities and spatial locations. In this paper, a trackwise ensemble event independent network with a novel data augmentation method is proposed. The proposed model is based on our previous proposed Event-Independent Network V2 and is extended by conformer blocks and dense blocks. The track-wise ensemble model with track-wise output format is proposed to solve an ensemble model problem for track-wise output format that track permutation may occur among different models. The data augmentation approach contains several data augmentation chains, which are composed of random combinations of several data augmentation operations. The method also utilizes log-mel spectrograms, intensity vectors, and Spatial Cues-Augmented Log-Spectrogram (SALSA) for different models. We evaluate our proposed method in the Task of the L3DAS22 challenge and obtain the top ranking solution with a location-dependent F-score to be 0.699. Source code is released.

Original language	English
Title of host publication	2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	9196-9200
Number of pages	5
ISBN (Electronic)	9781665405409
DOIs	https://doi.org/10.1109/ICASSP43922.2022.9747283
Publication status	Published - 2022
Externally published	Yes
Event	47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Hybrid, Singapore Duration: 23 May 2022 → 27 May 2022

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume	2022-May
ISSN (Print)	1520-6149

Conference

Conference	47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022
Country/Territory	Singapore
City	Hybrid
Period	23/05/22 → 27/05/22

Keywords

Sound event localization and detection
data augmentation chains
event-independent network
track-wise ensemble model

Access to Document

10.1109/ICASSP43922.2022.9747283

Cite this

Hu, J., Cao, Y., Wu, M., Kong, Q., Yang, F., Plumbley, M. D., & Yang, J. (2022). A TRACK-WISE ENSEMBLE EVENT INDEPENDENT NETWORK FOR POLYPHONIC SOUND EVENT LOCALIZATION AND DETECTION. In 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings (pp. 9196-9200). (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2022-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP43922.2022.9747283

Hu, Jinbo ; Cao, Yin ; Wu, Ming et al. / A TRACK-WISE ENSEMBLE EVENT INDEPENDENT NETWORK FOR POLYPHONIC SOUND EVENT LOCALIZATION AND DETECTION. 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2022. pp. 9196-9200 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{190d40093feb4aeda7f57919817cee65,

title = "A TRACK-WISE ENSEMBLE EVENT INDEPENDENT NETWORK FOR POLYPHONIC SOUND EVENT LOCALIZATION AND DETECTION",

abstract = "Polyphonic sound event localization and detection (SELD) aims at detecting types of sound events with corresponding temporal activities and spatial locations. In this paper, a trackwise ensemble event independent network with a novel data augmentation method is proposed. The proposed model is based on our previous proposed Event-Independent Network V2 and is extended by conformer blocks and dense blocks. The track-wise ensemble model with track-wise output format is proposed to solve an ensemble model problem for track-wise output format that track permutation may occur among different models. The data augmentation approach contains several data augmentation chains, which are composed of random combinations of several data augmentation operations. The method also utilizes log-mel spectrograms, intensity vectors, and Spatial Cues-Augmented Log-Spectrogram (SALSA) for different models. We evaluate our proposed method in the Task of the L3DAS22 challenge and obtain the top ranking solution with a location-dependent F-score to be 0.699. Source code is released.",

keywords = "Sound event localization and detection, data augmentation chains, event-independent network, track-wise ensemble model",

author = "Jinbo Hu and Yin Cao and Ming Wu and Qiuqiang Kong and Feiran Yang and Plumbley, {Mark D.} and Jun Yang",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE; 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 ; Conference date: 23-05-2022 Through 27-05-2022",

year = "2022",

doi = "10.1109/ICASSP43922.2022.9747283",

language = "English",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "9196--9200",

booktitle = "2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings",

}

Hu, J, Cao, Y, Wu, M, Kong, Q, Yang, F, Plumbley, MD & Yang, J 2022, A TRACK-WISE ENSEMBLE EVENT INDEPENDENT NETWORK FOR POLYPHONIC SOUND EVENT LOCALIZATION AND DETECTION. in 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2022-May, Institute of Electrical and Electronics Engineers Inc., pp. 9196-9200, 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022, Hybrid, Singapore, 23/05/22. https://doi.org/10.1109/ICASSP43922.2022.9747283

A TRACK-WISE ENSEMBLE EVENT INDEPENDENT NETWORK FOR POLYPHONIC SOUND EVENT LOCALIZATION AND DETECTION. / Hu, Jinbo; Cao, Yin; Wu, Ming et al.
2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2022. p. 9196-9200 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2022-May).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - A TRACK-WISE ENSEMBLE EVENT INDEPENDENT NETWORK FOR POLYPHONIC SOUND EVENT LOCALIZATION AND DETECTION

AU - Hu, Jinbo

AU - Cao, Yin

AU - Wu, Ming

AU - Kong, Qiuqiang

AU - Yang, Feiran

AU - Plumbley, Mark D.

AU - Yang, Jun

PY - 2022

Y1 - 2022

N2 - Polyphonic sound event localization and detection (SELD) aims at detecting types of sound events with corresponding temporal activities and spatial locations. In this paper, a trackwise ensemble event independent network with a novel data augmentation method is proposed. The proposed model is based on our previous proposed Event-Independent Network V2 and is extended by conformer blocks and dense blocks. The track-wise ensemble model with track-wise output format is proposed to solve an ensemble model problem for track-wise output format that track permutation may occur among different models. The data augmentation approach contains several data augmentation chains, which are composed of random combinations of several data augmentation operations. The method also utilizes log-mel spectrograms, intensity vectors, and Spatial Cues-Augmented Log-Spectrogram (SALSA) for different models. We evaluate our proposed method in the Task of the L3DAS22 challenge and obtain the top ranking solution with a location-dependent F-score to be 0.699. Source code is released.

AB - Polyphonic sound event localization and detection (SELD) aims at detecting types of sound events with corresponding temporal activities and spatial locations. In this paper, a trackwise ensemble event independent network with a novel data augmentation method is proposed. The proposed model is based on our previous proposed Event-Independent Network V2 and is extended by conformer blocks and dense blocks. The track-wise ensemble model with track-wise output format is proposed to solve an ensemble model problem for track-wise output format that track permutation may occur among different models. The data augmentation approach contains several data augmentation chains, which are composed of random combinations of several data augmentation operations. The method also utilizes log-mel spectrograms, intensity vectors, and Spatial Cues-Augmented Log-Spectrogram (SALSA) for different models. We evaluate our proposed method in the Task of the L3DAS22 challenge and obtain the top ranking solution with a location-dependent F-score to be 0.699. Source code is released.

KW - Sound event localization and detection

KW - data augmentation chains

KW - event-independent network

KW - track-wise ensemble model

UR - http://www.scopus.com/inward/record.url?scp=85131267203&partnerID=8YFLogxK

U2 - 10.1109/ICASSP43922.2022.9747283

DO - 10.1109/ICASSP43922.2022.9747283

M3 - Conference Proceeding

AN - SCOPUS:85131267203

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 9196

EP - 9200

BT - 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022

Y2 - 23 May 2022 through 27 May 2022

ER -

Hu J, Cao Y, Wu M, Kong Q, Yang F, Plumbley MD et al. A TRACK-WISE ENSEMBLE EVENT INDEPENDENT NETWORK FOR POLYPHONIC SOUND EVENT LOCALIZATION AND DETECTION. In 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2022. p. 9196-9200. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP43922.2022.9747283

A TRACK-WISE ENSEMBLE EVENT INDEPENDENT NETWORK FOR POLYPHONIC SOUND EVENT LOCALIZATION AND DETECTION

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this