Pre-trained DenseNet-121 with Multilayer Perceptron for Acoustic Event Classification

Pooi Shiang Tan; Kian Ming Lim; Cheah Heng Tan; Chin Poo Lee

Pre-trained DenseNet-121 with Multilayer Perceptron for Acoustic Event Classification

Pooi Shiang Tan, Kian Ming Lim^*, Cheah Heng Tan, Chin Poo Lee

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

6 Citations (Scopus)

Abstract

Acoustic event classification aims to classify the acoustic event into the correct classes, which is beneficial in surveillance, multimedia information retrieval, and smart cities. The main challenges of acoustic event classification are insufficient data to learn a good model and varying lengths of the acoustic input signal. In this paper, a deep learning architecture, namely: Pre-trained DenseNet-121 with Multilayer Perceptron is proposed in this work to classify the acoustic events into correct classes. To mitigate the data scarcity problem, two data augmentation techniques: time stretching and pitch shifting, are applied on training data to boost the number of training samples. Given the augmented acoustic signal, a frequency spectrogram technique is then employed to represent the acoustic event signal into a fixed-size image representation. The output of the spectrogram images are enriched with the information of the acoustic signal such as energy levels over time domain, frequency changes, signal strength, and amplitude. Subsequently, a pre-trained DenseNet-121 model is adopted as a transfer learning technique to extract significant features from the spectrogram image. In doing so, computation resources can be greatly reduced and improve the performance of the deep learning-based model. Three benchmark datasets: (1) Soundscapes1, (2) Soundscapes2, and (3) UrbanSound8K, are used to assess the performance of the proposed method. From the experimental results, the proposed Pre-trained DenseNet-121 with Multilayer Perceptron outperforms existing works on Soundscapes1, Soundscapes2, and UrbanSound8K datasets with the F1-scores of 80.7%, 87.3%, and 69.6%,

Original language	English
Article number	IJCS_50_1_07
Journal	IAENG International Journal of Computer Science
Volume	50
Issue number	1
Publication status	Published - Mar 2023
Externally published	Yes

Keywords

acoustic event classification
DenseNet
frequency spectrogram
multilayer perceptron
pitch shifting
time stretching

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Cite this

@article{a545deabc4144791ad0b2f84150217bf,

title = "Pre-trained DenseNet-121 with Multilayer Perceptron for Acoustic Event Classification",

abstract = "Acoustic event classification aims to classify the acoustic event into the correct classes, which is beneficial in surveillance, multimedia information retrieval, and smart cities. The main challenges of acoustic event classification are insufficient data to learn a good model and varying lengths of the acoustic input signal. In this paper, a deep learning architecture, namely: Pre-trained DenseNet-121 with Multilayer Perceptron is proposed in this work to classify the acoustic events into correct classes. To mitigate the data scarcity problem, two data augmentation techniques: time stretching and pitch shifting, are applied on training data to boost the number of training samples. Given the augmented acoustic signal, a frequency spectrogram technique is then employed to represent the acoustic event signal into a fixed-size image representation. The output of the spectrogram images are enriched with the information of the acoustic signal such as energy levels over time domain, frequency changes, signal strength, and amplitude. Subsequently, a pre-trained DenseNet-121 model is adopted as a transfer learning technique to extract significant features from the spectrogram image. In doing so, computation resources can be greatly reduced and improve the performance of the deep learning-based model. Three benchmark datasets: (1) Soundscapes1, (2) Soundscapes2, and (3) UrbanSound8K, are used to assess the performance of the proposed method. From the experimental results, the proposed Pre-trained DenseNet-121 with Multilayer Perceptron outperforms existing works on Soundscapes1, Soundscapes2, and UrbanSound8K datasets with the F1-scores of 80.7%, 87.3%, and 69.6%,",

keywords = "acoustic event classification, DenseNet, frequency spectrogram, multilayer perceptron, pitch shifting, time stretching",

author = "Tan, {Pooi Shiang} and Lim, {Kian Ming} and Tan, {Cheah Heng} and Lee, {Chin Poo}",

year = "2023",

month = mar,

language = "English",

volume = "50",

journal = "IAENG International Journal of Computer Science",

issn = "1819-656X",

number = "1",

}

TY - JOUR

T1 - Pre-trained DenseNet-121 with Multilayer Perceptron for Acoustic Event Classification

AU - Tan, Pooi Shiang

AU - Lim, Kian Ming

AU - Tan, Cheah Heng

AU - Lee, Chin Poo

PY - 2023/3

Y1 - 2023/3

N2 - Acoustic event classification aims to classify the acoustic event into the correct classes, which is beneficial in surveillance, multimedia information retrieval, and smart cities. The main challenges of acoustic event classification are insufficient data to learn a good model and varying lengths of the acoustic input signal. In this paper, a deep learning architecture, namely: Pre-trained DenseNet-121 with Multilayer Perceptron is proposed in this work to classify the acoustic events into correct classes. To mitigate the data scarcity problem, two data augmentation techniques: time stretching and pitch shifting, are applied on training data to boost the number of training samples. Given the augmented acoustic signal, a frequency spectrogram technique is then employed to represent the acoustic event signal into a fixed-size image representation. The output of the spectrogram images are enriched with the information of the acoustic signal such as energy levels over time domain, frequency changes, signal strength, and amplitude. Subsequently, a pre-trained DenseNet-121 model is adopted as a transfer learning technique to extract significant features from the spectrogram image. In doing so, computation resources can be greatly reduced and improve the performance of the deep learning-based model. Three benchmark datasets: (1) Soundscapes1, (2) Soundscapes2, and (3) UrbanSound8K, are used to assess the performance of the proposed method. From the experimental results, the proposed Pre-trained DenseNet-121 with Multilayer Perceptron outperforms existing works on Soundscapes1, Soundscapes2, and UrbanSound8K datasets with the F1-scores of 80.7%, 87.3%, and 69.6%,

AB - Acoustic event classification aims to classify the acoustic event into the correct classes, which is beneficial in surveillance, multimedia information retrieval, and smart cities. The main challenges of acoustic event classification are insufficient data to learn a good model and varying lengths of the acoustic input signal. In this paper, a deep learning architecture, namely: Pre-trained DenseNet-121 with Multilayer Perceptron is proposed in this work to classify the acoustic events into correct classes. To mitigate the data scarcity problem, two data augmentation techniques: time stretching and pitch shifting, are applied on training data to boost the number of training samples. Given the augmented acoustic signal, a frequency spectrogram technique is then employed to represent the acoustic event signal into a fixed-size image representation. The output of the spectrogram images are enriched with the information of the acoustic signal such as energy levels over time domain, frequency changes, signal strength, and amplitude. Subsequently, a pre-trained DenseNet-121 model is adopted as a transfer learning technique to extract significant features from the spectrogram image. In doing so, computation resources can be greatly reduced and improve the performance of the deep learning-based model. Three benchmark datasets: (1) Soundscapes1, (2) Soundscapes2, and (3) UrbanSound8K, are used to assess the performance of the proposed method. From the experimental results, the proposed Pre-trained DenseNet-121 with Multilayer Perceptron outperforms existing works on Soundscapes1, Soundscapes2, and UrbanSound8K datasets with the F1-scores of 80.7%, 87.3%, and 69.6%,

KW - acoustic event classification

KW - DenseNet

KW - frequency spectrogram

KW - multilayer perceptron

KW - pitch shifting

KW - time stretching

UR - http://www.scopus.com/inward/record.url?scp=85149661080&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:85149661080

SN - 1819-656X

VL - 50

JO - IAENG International Journal of Computer Science

JF - IAENG International Journal of Computer Science

IS - 1

M1 - IJCS_50_1_07

ER -

Pre-trained DenseNet-121 with Multilayer Perceptron for Acoustic Event Classification

Abstract

Keywords

UN SDGs

Other files and links

Fingerprint

Cite this