Cycle-Consistent Generative Adversarial Network Architectures for Audio Visual Speech Recognition

Yibo He, Kah Phooi Seng, Li Minn Ang, Xingyu Zhao

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

Abstract

Generative Adversarial Networks (GANs) have found extensive applications in image classification and image generation domains. Nevertheless, their utilisation for recognising and detecting multimodal images presents considerable difficulties. Audio Visual Speech Recognition (AVSR) is a classic task in multimodal audio-visual sensing, which leverages audio inputs from human speech and aligned visual inputs from lip movements. However, the performance of AVSR is impacted by the inherent discrepancies present in real-world environments, such as variations in lighting intensity, noise, and sampling devices. To mitigate these challenges, this paper proposed a AVSR architecture based on a specially constructed Cycle-Consistent Adversarial Networks (CycleGAN). First, on the visual side, we used data-Augmentation methods such as flipping and rotating to process video data, increasing the number and variety of samples. This increases the robustness and generalisation capabilities of the model. Then, since the AVSR dataset was collected in different environments with different styles, we transformed the original images multiple times through the specially constructed CycleGAN module to address the inherent differences in the different environments. To validate the approaches, we used augmented data from well-known datasets (LRS2-Lip Reading Sentences 2 and LRS3) in the training process. Experimental results validate the correctness and effectiveness of the approach.

Original languageEnglish
Title of host publicationProceedings of 2023 IEEE International Conference on Signal Processing, Communications and Computing, ICSPCC 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350316728
DOIs
Publication statusPublished - 2023
Event2023 IEEE International Conference on Signal Processing, Communications and Computing, ICSPCC 2023 - Zhengzhou, Henan, China
Duration: 14 Nov 202317 Nov 2023

Publication series

NameProceedings of 2023 IEEE International Conference on Signal Processing, Communications and Computing, ICSPCC 2023

Conference

Conference2023 IEEE International Conference on Signal Processing, Communications and Computing, ICSPCC 2023
Country/TerritoryChina
CityZhengzhou, Henan
Period14/11/2317/11/23

Keywords

  • Generative Adversarial Networks (GANs)
  • audio visual speech recognition
  • deep learning

Fingerprint

Dive into the research topics of 'Cycle-Consistent Generative Adversarial Network Architectures for Audio Visual Speech Recognition'. Together they form a unique fingerprint.

Cite this