Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training

Bin Liu; Shuai Nie; Yaping Zhang; Dengfeng Ke; Shan Liang; Wenju Liu

doi:10.1109/ICASSP.2018.8462093

Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training

Bin Liu, Shuai Nie, Yaping Zhang, Dengfeng Ke, Shan Liang, Wenju Liu

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

19 Citations (Scopus)

Abstract

In realistic environments, speech is usually interfered by various noise and reverberation, which dramatically degrades the performance of automatic speech recognition (ASR) systems. To alleviate this issue, the commonest way is to use a well-designed speech enhancement approach as the front-end of ASR. However, more complex pipelines, more computations and even higher hardware costs (microphone array) are additionally consumed for this kind of methods. In addition, speech enhancement would result in speech distortions and mismatches to training. In this paper, we propose an adversarial training method to directly boost noise robustness of acoustic model. Specifically, a jointly compositional scheme of generative adversarial net (GAN) and neural network-based acoustic model (AM) is used in the training phase. GAN is used to generate clean feature representations from noisy features by the guidance of a discriminator that tries to distinguish between the true clean signals and generated signals. The joint optimization of generator, discriminator and AM concentrates the strengths of both GAN and AM for speech recognition. Systematic experiments on CHiME-4 show that the proposed method significantly improves the noise robustness of AM and achieves the average relative error rate reduction of 23.38% and 11.54% on the development and test set, respectively.

Original language	English
Title of host publication	2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	5034-5038
Number of pages	5
ISBN (Print)	9781538646588
DOIs	https://doi.org/10.1109/ICASSP.2018.8462093
Publication status	Published - 10 Sept 2018
Externally published	Yes
Event	2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Calgary, Canada Duration: 15 Apr 2018 → 20 Apr 2018

Publication series

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume	2018-April
ISSN (Print)	1520-6149

Conference

Conference	2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
Country/Territory	Canada
City	Calgary
Period	15/04/18 → 20/04/18

Keywords

Acoustic model
Deep adversarial training
Generative adversarial net
Robust speech recognition

Access to Document

10.1109/ICASSP.2018.8462093

Cite this

Liu, B., Nie, S., Zhang, Y., Ke, D., Liang, S., & Liu, W. (2018). Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training. In 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings (pp. 5034-5038). Article 8462093 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2018-April). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2018.8462093

Liu, Bin ; Nie, Shuai ; Zhang, Yaping et al. / Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training. 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 5034-5038 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{a06385ade53d441a90084094cab150fa,

title = "Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training",

abstract = "In realistic environments, speech is usually interfered by various noise and reverberation, which dramatically degrades the performance of automatic speech recognition (ASR) systems. To alleviate this issue, the commonest way is to use a well-designed speech enhancement approach as the front-end of ASR. However, more complex pipelines, more computations and even higher hardware costs (microphone array) are additionally consumed for this kind of methods. In addition, speech enhancement would result in speech distortions and mismatches to training. In this paper, we propose an adversarial training method to directly boost noise robustness of acoustic model. Specifically, a jointly compositional scheme of generative adversarial net (GAN) and neural network-based acoustic model (AM) is used in the training phase. GAN is used to generate clean feature representations from noisy features by the guidance of a discriminator that tries to distinguish between the true clean signals and generated signals. The joint optimization of generator, discriminator and AM concentrates the strengths of both GAN and AM for speech recognition. Systematic experiments on CHiME-4 show that the proposed method significantly improves the noise robustness of AM and achieves the average relative error rate reduction of 23.38% and 11.54% on the development and test set, respectively.",

keywords = "Acoustic model, Deep adversarial training, Generative adversarial net, Robust speech recognition",

author = "Bin Liu and Shuai Nie and Yaping Zhang and Dengfeng Ke and Shan Liang and Wenju Liu",

note = "Publisher Copyright: {\textcopyright} 2018 IEEE.; 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 ; Conference date: 15-04-2018 Through 20-04-2018",

year = "2018",

month = sep,

day = "10",

doi = "10.1109/ICASSP.2018.8462093",

language = "English",

isbn = "9781538646588",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "5034--5038",

booktitle = "2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings",

}

Liu, B, Nie, S, Zhang, Y, Ke, D, Liang, S & Liu, W 2018, Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training. in 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings., 8462093, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2018-April, Institute of Electrical and Electronics Engineers Inc., pp. 5034-5038, 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018, Calgary, Canada, 15/04/18. https://doi.org/10.1109/ICASSP.2018.8462093

Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training. / Liu, Bin; Nie, Shuai; Zhang, Yaping et al.
2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2018. p. 5034-5038 8462093 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2018-April).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training

AU - Liu, Bin

AU - Nie, Shuai

AU - Zhang, Yaping

AU - Ke, Dengfeng

AU - Liang, Shan

AU - Liu, Wenju

PY - 2018/9/10

Y1 - 2018/9/10

N2 - In realistic environments, speech is usually interfered by various noise and reverberation, which dramatically degrades the performance of automatic speech recognition (ASR) systems. To alleviate this issue, the commonest way is to use a well-designed speech enhancement approach as the front-end of ASR. However, more complex pipelines, more computations and even higher hardware costs (microphone array) are additionally consumed for this kind of methods. In addition, speech enhancement would result in speech distortions and mismatches to training. In this paper, we propose an adversarial training method to directly boost noise robustness of acoustic model. Specifically, a jointly compositional scheme of generative adversarial net (GAN) and neural network-based acoustic model (AM) is used in the training phase. GAN is used to generate clean feature representations from noisy features by the guidance of a discriminator that tries to distinguish between the true clean signals and generated signals. The joint optimization of generator, discriminator and AM concentrates the strengths of both GAN and AM for speech recognition. Systematic experiments on CHiME-4 show that the proposed method significantly improves the noise robustness of AM and achieves the average relative error rate reduction of 23.38% and 11.54% on the development and test set, respectively.

AB - In realistic environments, speech is usually interfered by various noise and reverberation, which dramatically degrades the performance of automatic speech recognition (ASR) systems. To alleviate this issue, the commonest way is to use a well-designed speech enhancement approach as the front-end of ASR. However, more complex pipelines, more computations and even higher hardware costs (microphone array) are additionally consumed for this kind of methods. In addition, speech enhancement would result in speech distortions and mismatches to training. In this paper, we propose an adversarial training method to directly boost noise robustness of acoustic model. Specifically, a jointly compositional scheme of generative adversarial net (GAN) and neural network-based acoustic model (AM) is used in the training phase. GAN is used to generate clean feature representations from noisy features by the guidance of a discriminator that tries to distinguish between the true clean signals and generated signals. The joint optimization of generator, discriminator and AM concentrates the strengths of both GAN and AM for speech recognition. Systematic experiments on CHiME-4 show that the proposed method significantly improves the noise robustness of AM and achieves the average relative error rate reduction of 23.38% and 11.54% on the development and test set, respectively.

KW - Acoustic model

KW - Deep adversarial training

KW - Generative adversarial net

KW - Robust speech recognition

UR - http://www.scopus.com/inward/record.url?scp=85054218918&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2018.8462093

DO - 10.1109/ICASSP.2018.8462093

M3 - Conference Proceeding

AN - SCOPUS:85054218918

SN - 9781538646588

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 5034

EP - 5038

BT - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018

Y2 - 15 April 2018 through 20 April 2018

ER -

Liu B, Nie S, Zhang Y, Ke D, Liang S, Liu W. Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training. In 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2018. p. 5034-5038. 8462093. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP.2018.8462093

Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Cite this