基于迁移学习与强化学习的自动音频标注系统

Gengyun Chen; Shengchen Li; Xi Shao; Xinhao Mei; Xubo Liu; Qiushi Huang; Wenwu Wang

doi:10.15943/j.cnki.fdxb-jns.20221017.005

基于迁移学习与强化学习的自动音频标注系统

Translated title of the contribution: Automated Audio Caption System Based on Transfer Learning and Reinforcement Learning

Gengyun Chen, Shengchen Li, Xi Shao^*, Xinhao Mei, Xubo Liu, Qiushi Huang, Wenwu Wang

^*Corresponding author for this work

Department of Intelligent Science

Research output: Contribution to journal › Article › peer-review

Abstract

Automated audio captioning task aims at generating text description of a piece of audio. This system based on an encoder-decoder architecture, consisting of a Convolutional Neural Network(CNN) encoder and a Transformer decoder. In order to solve the problem of the mismatch between evaluation metrics and the loss function, reinforcement learning is investigated for generating more accurate caption. Furthermore, the system has a pre-trained encoder via transfer learning. Clotho dataset was chosen as our dataset. The results show that both techniques can further improve the performance of the captioning system.

Translated title of the contribution	Automated Audio Caption System Based on Transfer Learning and Reinforcement Learning
Original language	Chinese (Simplified)
Pages (from-to)	520-526
Number of pages	7
Journal	Journal of Fudan University (Natural Science)
Volume	61
Issue number	5
DOIs	https://doi.org/10.15943/j.cnki.fdxb-jns.20221017.005
Publication status	Published - Oct 2022

Keywords

automated audio caption
deep learning
reinforcement learning
transfer learning

Access to Document

10.15943/j.cnki.fdxb-jns.20221017.005

Cite this

@article{adda8719e57e4a17a4a57e49e2d7ec20,

title = "基于迁移学习与强化学习的自动音频标注系统",

abstract = "Automated audio captioning task aims at generating text description of a piece of audio. This system based on an encoder-decoder architecture, consisting of a Convolutional Neural Network(CNN) encoder and a Transformer decoder. In order to solve the problem of the mismatch between evaluation metrics and the loss function, reinforcement learning is investigated for generating more accurate caption. Furthermore, the system has a pre-trained encoder via transfer learning. Clotho dataset was chosen as our dataset. The results show that both techniques can further improve the performance of the captioning system.",

keywords = "automated audio caption, deep learning, reinforcement learning, transfer learning",

author = "Gengyun Chen and Shengchen Li and Xi Shao and Xinhao Mei and Xubo Liu and Qiushi Huang and Wenwu Wang",

year = "2022",

month = oct,

doi = "10.15943/j.cnki.fdxb-jns.20221017.005",

language = "简体中文",

volume = "61",

pages = "520--526",

journal = "Journal of Fudan University (Natural Science)",

issn = "0427-7104",

number = "5",

}

TY - JOUR

T1 - 基于迁移学习与强化学习的自动音频标注系统

AU - Chen, Gengyun

AU - Li, Shengchen

AU - Shao, Xi

AU - Mei, Xinhao

AU - Liu, Xubo

AU - Huang, Qiushi

AU - Wang, Wenwu

PY - 2022/10

Y1 - 2022/10

N2 - Automated audio captioning task aims at generating text description of a piece of audio. This system based on an encoder-decoder architecture, consisting of a Convolutional Neural Network(CNN) encoder and a Transformer decoder. In order to solve the problem of the mismatch between evaluation metrics and the loss function, reinforcement learning is investigated for generating more accurate caption. Furthermore, the system has a pre-trained encoder via transfer learning. Clotho dataset was chosen as our dataset. The results show that both techniques can further improve the performance of the captioning system.

AB - Automated audio captioning task aims at generating text description of a piece of audio. This system based on an encoder-decoder architecture, consisting of a Convolutional Neural Network(CNN) encoder and a Transformer decoder. In order to solve the problem of the mismatch between evaluation metrics and the loss function, reinforcement learning is investigated for generating more accurate caption. Furthermore, the system has a pre-trained encoder via transfer learning. Clotho dataset was chosen as our dataset. The results show that both techniques can further improve the performance of the captioning system.

KW - automated audio caption

KW - deep learning

KW - reinforcement learning

KW - transfer learning

UR - http://www.scopus.com/inward/record.url?scp=85199433611&partnerID=8YFLogxK

U2 - 10.15943/j.cnki.fdxb-jns.20221017.005

DO - 10.15943/j.cnki.fdxb-jns.20221017.005

M3 - 文章

AN - SCOPUS:85199433611

SN - 0427-7104

VL - 61

SP - 520

EP - 526

JO - Journal of Fudan University (Natural Science)

JF - Journal of Fudan University (Natural Science)

IS - 5

ER -

基于迁移学习与强化学习的自动音频标注系统

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this