Acoustic Scene Generation with Conditional Samplernn

Qiuqiang Kong, Yong Xu, Turab Iqbal, Yin Cao, Wenwu Wang, Mark D. Plumbley

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

12 Citations (Scopus)

Abstract

Acoustic scene generation (ASG) is a task to generate waveforms for acoustic scenes. ASG can be used to generate audio scenes for movies and computer games. Recently, neural networks such as SampleRNN have been used for speech and music generation. However, ASG is more challenging due to its wide variety. In addition, evaluating a generative model is also difficult. In this paper, we propose to use a conditional SampleRNN model to generate acoustic scenes conditioned on the input classes. We also propose objective criteria to evaluate the quality and diversity of the generated samples based on classification accuracy. The experiments on the DCASE 2016 Task 1 acoustic scene data show that with the generated audio samples, a classification accuracy of 65.5% can be achieved compared to samples generated by a random model of 6.7% and samples from real recording of 83.1%. The performance of a classifier trained only on generated samples achieves an accuracy of 51.3%, as opposed to an accuracy of 6.7% with samples generated by a random model.

Original languageEnglish
Title of host publication2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages925-929
Number of pages5
ISBN (Electronic)9781479981311
DOIs
Publication statusPublished - May 2019
Externally publishedYes
Event44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Brighton, United Kingdom
Duration: 12 May 201917 May 2019

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2019-May
ISSN (Print)1520-6149

Conference

Conference44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019
Country/TerritoryUnited Kingdom
CityBrighton
Period12/05/1917/05/19

Keywords

  • SampleRNN
  • acoustic scene generation
  • generative model
  • recurrent neural network

Fingerprint

Dive into the research topics of 'Acoustic Scene Generation with Conditional Samplernn'. Together they form a unique fingerprint.

Cite this