CycleGAN∗: Collaborative AI Learning With Improved Adversarial Neural Networks for Multimodalities Data

Yibo He, Kah Phooi Seng*, Li Minn Ang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

With the widespread adoption of generative adversarial networks (GANs) for sample generation, this article aims to enhance adversarial neural networks to facilitate collaborative artificial intelligence (AI) learning which has been specifically tailored to handle datasets containing multimodalities. Currently, a significant portion of the literature is dedicated to sample generation using GANs, with the objective of enhancing the detection performance of machine learning (ML) classifiers through the incorporation of these generated data into the original training set via adversarial training. The quality of the generated adversarial samples is contingent upon the sufficiency of training data samples. However, in the multimodal domain, the scarcity of multimodal data poses a challenge due to resource constraints. In this article, we address this challenge by proposing a new multimodal dataset generation approach based on the classical audio-visual speech recognition (AVSR) task, utilizing CycleGAN, DiscoGAN, and StyleGAN2 for exploration and performance comparison. AVSR experiments are conducted using the LRS2 and LRS3 corpora. Our experiments reveal that CycleGAN, DiscoGAN, and StyleGAN2 do not effectively address the low-data state problem in AVSR classification. Consequently, we introduce an enhanced model, CycleGAN∗, based on the original CycleGAN, which efficiently learns the original dataset features and generates high-quality multimodal data. Experimental results demonstrate that the multimodal datasets generated by our proposed CycleGAN∗ exhibit significant improvement in word error rate (WER), indicating reduced errors. Notably, the images produced by CycleGAN∗ exhibit a marked enhancement in overall visual clarity, indicative of its superior generative capabilities. Furthermore, in contrast to traditional approaches, we underscore the significance of collaborative learning. We implement co-training with diverse multimodal data to facilitate information sharing and complementary learning across modalities. This collaborative approach enhances the model's capability to integrate heterogeneous information, thereby boosting its performance in multimodal environments.

Original languageEnglish
Pages (from-to)5616-5629
Number of pages14
JournalIEEE Transactions on Artificial Intelligence
Volume5
Issue number11
DOIs
Publication statusPublished - 2024

Keywords

  • AudioâBBvisual speech recognition (AVSR)
  • deep learning
  • generative adversarial networks (GANs)

Fingerprint

Dive into the research topics of 'CycleGAN∗: Collaborative AI Learning With Improved Adversarial Neural Networks for Multimodalities Data'. Together they form a unique fingerprint.

Cite this