Collaborative AI Dysarthric Speech Recognition System With Data Augmentation Using Generative Adversarial Neural Network

Yibo He, Kah Phooi Seng*, Li Minn Ang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

This paper proposes a novel collaborative dysarthric speech recognition system designed to convert dysarthric speech into non-dysarthric speech to enhance the robustness of automatic speech recognition (ASR) systems fine-tuned for dysarthric speech. The system employs an innovative three-stage data augmentation framework: The first stage collaboratively augments the training dataset by generating static data and high-quality synthetic speech samples using a natural text-to-speech model (Tacotron2). The second stage applies a tempo perturbation technique that simulates the natural variation of speech rhythms by adjusting the playback tempo to improve the model’s adaptability to varying speech speeds. The third stage integrates the Inception-ResNet module with a temporal masking strategy using an enhanced CycleGAN-based conversion model to efficiently map conformal and non-conformal phonological features while preserving the overall speech structure and resolving temporal irregularities. Experiments conducted on the UASpeech corpus demonstrate a significant reduction in the word error rate (WER) compared to the baseline approach. Specifically, the three-stage data enhancement process achieves a reduction in the WER for the fine-tuned Wav2Vec2-XLSR and Whisper-Tiny models by 9.81% and 6.56%, respectively, with an average WER of 13.58% for the best performing system. These results highlight the effectiveness of the collaborative framework in improving the accuracy and naturalness of speech recognition for dysarthria, thereby offering individuals with dysarthria a more natural and intelligible communication experience.

Original languageEnglish
Pages (from-to)2097-2111
Number of pages15
JournalIEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING
Volume33
DOIs
Publication statusPublished - 2025

Keywords

  • Collaborative AI
  • data augmentation
  • deep learning
  • dysarthric speech
  • generative adversarial networks (GANs)
  • speech recognition

Cite this