TY - JOUR
T1 - Pre-trained DenseNet-121 with Multilayer Perceptron for Acoustic Event Classification
AU - Tan, Pooi Shiang
AU - Lim, Kian Ming
AU - Tan, Cheah Heng
AU - Lee, Chin Poo
N1 - Publisher Copyright:
© 2023, IAENG International Journal of Computer Science.All Rights Reserved.
PY - 2023/3
Y1 - 2023/3
N2 - Acoustic event classification aims to classify the acoustic event into the correct classes, which is beneficial in surveillance, multimedia information retrieval, and smart cities. The main challenges of acoustic event classification are insufficient data to learn a good model and varying lengths of the acoustic input signal. In this paper, a deep learning architecture, namely: Pre-trained DenseNet-121 with Multilayer Perceptron is proposed in this work to classify the acoustic events into correct classes. To mitigate the data scarcity problem, two data augmentation techniques: time stretching and pitch shifting, are applied on training data to boost the number of training samples. Given the augmented acoustic signal, a frequency spectrogram technique is then employed to represent the acoustic event signal into a fixed-size image representation. The output of the spectrogram images are enriched with the information of the acoustic signal such as energy levels over time domain, frequency changes, signal strength, and amplitude. Subsequently, a pre-trained DenseNet-121 model is adopted as a transfer learning technique to extract significant features from the spectrogram image. In doing so, computation resources can be greatly reduced and improve the performance of the deep learning-based model. Three benchmark datasets: (1) Soundscapes1, (2) Soundscapes2, and (3) UrbanSound8K, are used to assess the performance of the proposed method. From the experimental results, the proposed Pre-trained DenseNet-121 with Multilayer Perceptron outperforms existing works on Soundscapes1, Soundscapes2, and UrbanSound8K datasets with the F1-scores of 80.7%, 87.3%, and 69.6%,
AB - Acoustic event classification aims to classify the acoustic event into the correct classes, which is beneficial in surveillance, multimedia information retrieval, and smart cities. The main challenges of acoustic event classification are insufficient data to learn a good model and varying lengths of the acoustic input signal. In this paper, a deep learning architecture, namely: Pre-trained DenseNet-121 with Multilayer Perceptron is proposed in this work to classify the acoustic events into correct classes. To mitigate the data scarcity problem, two data augmentation techniques: time stretching and pitch shifting, are applied on training data to boost the number of training samples. Given the augmented acoustic signal, a frequency spectrogram technique is then employed to represent the acoustic event signal into a fixed-size image representation. The output of the spectrogram images are enriched with the information of the acoustic signal such as energy levels over time domain, frequency changes, signal strength, and amplitude. Subsequently, a pre-trained DenseNet-121 model is adopted as a transfer learning technique to extract significant features from the spectrogram image. In doing so, computation resources can be greatly reduced and improve the performance of the deep learning-based model. Three benchmark datasets: (1) Soundscapes1, (2) Soundscapes2, and (3) UrbanSound8K, are used to assess the performance of the proposed method. From the experimental results, the proposed Pre-trained DenseNet-121 with Multilayer Perceptron outperforms existing works on Soundscapes1, Soundscapes2, and UrbanSound8K datasets with the F1-scores of 80.7%, 87.3%, and 69.6%,
KW - acoustic event classification
KW - DenseNet
KW - frequency spectrogram
KW - multilayer perceptron
KW - pitch shifting
KW - time stretching
UR - http://www.scopus.com/inward/record.url?scp=85149661080&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85149661080
SN - 1819-656X
VL - 50
JO - IAENG International Journal of Computer Science
JF - IAENG International Journal of Computer Science
IS - 1
M1 - IJCS_50_1_07
ER -