Sub-spectrogram Segmentation for Environmental Sound Classification via Convolutional Recurrent Neural Network and Score Level Fusion

Tianhao Qiao; Shunqing Zhang; Zhichao Zhang; Shan Cao; Shugong Xu

doi:10.1109/SiPS47522.2019.9020418

Sub-spectrogram Segmentation for Environmental Sound Classification via Convolutional Recurrent Neural Network and Score Level Fusion

Tianhao Qiao, Shunqing Zhang, Zhichao Zhang, Shan Cao, Shugong Xu

Shanghai University

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

13 Citations (Scopus)

Abstract

Environmental Sound Classification (ESC) is an important and challenging problem, and feature representation is a critical and even decisive factor in ESC. Feature representation ability directly affects the accuracy of sound classification. Therefore, the ESC performance is heavily dependent on the effectiveness of representative features extracted from the environmental sounds. In this paper, we propose a sub-spectrogram segmentation based ESC classification framework. In addition, we adopt the proposed Convolutional Recurrent Neural Network (CRNN) and score level fusion to jointly improve the classification accuracy. Extensive truncation schemes are evaluated to find the optimal number and the corresponding band ranges of sub-spectrograms. Based on the numerical experiments, the proposed framework can achieve 81.9% ESC classification accuracy on the public dataset ESC-50, which provides 9.1% accuracy improvement over traditional baseline schemes.

Original language	English
Title of host publication	2019 IEEE International Workshop on Signal Processing Systems, SiPS 2019
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	318-323
Number of pages	6
ISBN (Electronic)	9781728119274
DOIs	https://doi.org/10.1109/SiPS47522.2019.9020418
Publication status	Published - Oct 2019
Externally published	Yes
Event	33rd IEEE International Workshop on Signal Processing Systems, SiPS 2019 - Nanjing, China Duration: 20 Oct 2019 → 23 Oct 2019

Publication series

Name	IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation
Volume	2019-October
ISSN (Print)	1520-6130

Conference

Conference	33rd IEEE International Workshop on Signal Processing Systems, SiPS 2019
Country/Territory	China
City	Nanjing
Period	20/10/19 → 23/10/19

Keywords

convolutional recurrent neural network
Environmental sound classification
score level fusion
sub-spectrogram segmentation

Access to Document

10.1109/SiPS47522.2019.9020418

Cite this

Qiao, T., Zhang, S., Zhang, Z., Cao, S., & Xu, S. (2019). Sub-spectrogram Segmentation for Environmental Sound Classification via Convolutional Recurrent Neural Network and Score Level Fusion. In 2019 IEEE International Workshop on Signal Processing Systems, SiPS 2019 (pp. 318-323). Article 9020418 (IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation; Vol. 2019-October). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/SiPS47522.2019.9020418

Qiao, Tianhao ; Zhang, Shunqing ; Zhang, Zhichao et al. / Sub-spectrogram Segmentation for Environmental Sound Classification via Convolutional Recurrent Neural Network and Score Level Fusion. 2019 IEEE International Workshop on Signal Processing Systems, SiPS 2019. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 318-323 (IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation).

@inproceedings{59b86a23af614b3d8c2d2c3651f25f33,

title = "Sub-spectrogram Segmentation for Environmental Sound Classification via Convolutional Recurrent Neural Network and Score Level Fusion",

abstract = "Environmental Sound Classification (ESC) is an important and challenging problem, and feature representation is a critical and even decisive factor in ESC. Feature representation ability directly affects the accuracy of sound classification. Therefore, the ESC performance is heavily dependent on the effectiveness of representative features extracted from the environmental sounds. In this paper, we propose a sub-spectrogram segmentation based ESC classification framework. In addition, we adopt the proposed Convolutional Recurrent Neural Network (CRNN) and score level fusion to jointly improve the classification accuracy. Extensive truncation schemes are evaluated to find the optimal number and the corresponding band ranges of sub-spectrograms. Based on the numerical experiments, the proposed framework can achieve 81.9% ESC classification accuracy on the public dataset ESC-50, which provides 9.1% accuracy improvement over traditional baseline schemes.",

keywords = "convolutional recurrent neural network, Environmental sound classification, score level fusion, sub-spectrogram segmentation",

author = "Tianhao Qiao and Shunqing Zhang and Zhichao Zhang and Shan Cao and Shugong Xu",

note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 33rd IEEE International Workshop on Signal Processing Systems, SiPS 2019 ; Conference date: 20-10-2019 Through 23-10-2019",

year = "2019",

month = oct,

doi = "10.1109/SiPS47522.2019.9020418",

language = "English",

series = "IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "318--323",

booktitle = "2019 IEEE International Workshop on Signal Processing Systems, SiPS 2019",

}

Qiao, T, Zhang, S, Zhang, Z, Cao, S & Xu, S 2019, Sub-spectrogram Segmentation for Environmental Sound Classification via Convolutional Recurrent Neural Network and Score Level Fusion. in 2019 IEEE International Workshop on Signal Processing Systems, SiPS 2019., 9020418, IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation, vol. 2019-October, Institute of Electrical and Electronics Engineers Inc., pp. 318-323, 33rd IEEE International Workshop on Signal Processing Systems, SiPS 2019, Nanjing, China, 20/10/19. https://doi.org/10.1109/SiPS47522.2019.9020418

Sub-spectrogram Segmentation for Environmental Sound Classification via Convolutional Recurrent Neural Network and Score Level Fusion. / Qiao, Tianhao; Zhang, Shunqing; Zhang, Zhichao et al.
2019 IEEE International Workshop on Signal Processing Systems, SiPS 2019. Institute of Electrical and Electronics Engineers Inc., 2019. p. 318-323 9020418 (IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation; Vol. 2019-October).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Sub-spectrogram Segmentation for Environmental Sound Classification via Convolutional Recurrent Neural Network and Score Level Fusion

AU - Qiao, Tianhao

AU - Zhang, Shunqing

AU - Zhang, Zhichao

AU - Cao, Shan

AU - Xu, Shugong

PY - 2019/10

Y1 - 2019/10

N2 - Environmental Sound Classification (ESC) is an important and challenging problem, and feature representation is a critical and even decisive factor in ESC. Feature representation ability directly affects the accuracy of sound classification. Therefore, the ESC performance is heavily dependent on the effectiveness of representative features extracted from the environmental sounds. In this paper, we propose a sub-spectrogram segmentation based ESC classification framework. In addition, we adopt the proposed Convolutional Recurrent Neural Network (CRNN) and score level fusion to jointly improve the classification accuracy. Extensive truncation schemes are evaluated to find the optimal number and the corresponding band ranges of sub-spectrograms. Based on the numerical experiments, the proposed framework can achieve 81.9% ESC classification accuracy on the public dataset ESC-50, which provides 9.1% accuracy improvement over traditional baseline schemes.

AB - Environmental Sound Classification (ESC) is an important and challenging problem, and feature representation is a critical and even decisive factor in ESC. Feature representation ability directly affects the accuracy of sound classification. Therefore, the ESC performance is heavily dependent on the effectiveness of representative features extracted from the environmental sounds. In this paper, we propose a sub-spectrogram segmentation based ESC classification framework. In addition, we adopt the proposed Convolutional Recurrent Neural Network (CRNN) and score level fusion to jointly improve the classification accuracy. Extensive truncation schemes are evaluated to find the optimal number and the corresponding band ranges of sub-spectrograms. Based on the numerical experiments, the proposed framework can achieve 81.9% ESC classification accuracy on the public dataset ESC-50, which provides 9.1% accuracy improvement over traditional baseline schemes.

KW - convolutional recurrent neural network

KW - Environmental sound classification

KW - score level fusion

KW - sub-spectrogram segmentation

UR - http://www.scopus.com/inward/record.url?scp=85082381356&partnerID=8YFLogxK

U2 - 10.1109/SiPS47522.2019.9020418

DO - 10.1109/SiPS47522.2019.9020418

M3 - Conference Proceeding

AN - SCOPUS:85082381356

T3 - IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation

SP - 318

EP - 323

BT - 2019 IEEE International Workshop on Signal Processing Systems, SiPS 2019

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 33rd IEEE International Workshop on Signal Processing Systems, SiPS 2019

Y2 - 20 October 2019 through 23 October 2019

ER -

Qiao T, Zhang S, Zhang Z, Cao S, Xu S. Sub-spectrogram Segmentation for Environmental Sound Classification via Convolutional Recurrent Neural Network and Score Level Fusion. In 2019 IEEE International Workshop on Signal Processing Systems, SiPS 2019. Institute of Electrical and Electronics Engineers Inc. 2019. p. 318-323. 9020418. (IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation). doi: 10.1109/SiPS47522.2019.9020418

Sub-spectrogram Segmentation for Environmental Sound Classification via Convolutional Recurrent Neural Network and Score Level Fusion

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this