基于迁移学习与强化学习的自动音频标注系统

Translated title of the contribution: Automated Audio Caption System Based on Transfer Learning and Reinforcement Learning

Gengyun Chen, Shengchen Li, Xi Shao*, Xinhao Mei, Xubo Liu, Qiushi Huang, Wenwu Wang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Automated audio captioning task aims at generating text description of a piece of audio. This system based on an encoder-decoder architecture, consisting of a Convolutional Neural Network(CNN) encoder and a Transformer decoder. In order to solve the problem of the mismatch between evaluation metrics and the loss function, reinforcement learning is investigated for generating more accurate caption. Furthermore, the system has a pre-trained encoder via transfer learning. Clotho dataset was chosen as our dataset. The results show that both techniques can further improve the performance of the captioning system.

Translated title of the contributionAutomated Audio Caption System Based on Transfer Learning and Reinforcement Learning
Original languageChinese (Simplified)
Pages (from-to)520-526
Number of pages7
JournalJournal of Fudan University (Natural Science)
Volume61
Issue number5
DOIs
Publication statusPublished - Oct 2022

Keywords

  • automated audio caption
  • deep learning
  • reinforcement learning
  • transfer learning

Fingerprint

Dive into the research topics of 'Automated Audio Caption System Based on Transfer Learning and Reinforcement Learning'. Together they form a unique fingerprint.

Cite this