Abstract
Automated audio captioning task aims at generating text description of a piece of audio. This system based on an encoder-decoder architecture, consisting of a Convolutional Neural Network(CNN) encoder and a Transformer decoder. In order to solve the problem of the mismatch between evaluation metrics and the loss function, reinforcement learning is investigated for generating more accurate caption. Furthermore, the system has a pre-trained encoder via transfer learning. Clotho dataset was chosen as our dataset. The results show that both techniques can further improve the performance of the captioning system.
Translated title of the contribution | Automated Audio Caption System Based on Transfer Learning and Reinforcement Learning |
---|---|
Original language | Chinese (Simplified) |
Pages (from-to) | 520-526 |
Number of pages | 7 |
Journal | Journal of Fudan University (Natural Science) |
Volume | 61 |
Issue number | 5 |
DOIs | |
Publication status | Published - Oct 2022 |
Keywords
- automated audio caption
- deep learning
- reinforcement learning
- transfer learning