Using Feature Visualisation for Explaining Deep Learning Models in Visual Speech

Timothy Israel Santos, Andrew Abel

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

6 Citations (Scopus)

Abstract

The use of Deep Neural Network (DNN) models for Visual Speech Recognition (VSR) has recently been gaining traction. The use of more complex DNN models have greatly increased accuracy performance but have the downside of very poor explainability. There is still much room for improvement in using DNN models for VSR in comparison to audio-only speech recognition. Being able to explain the model and its predictions would be beneficial for improving its performance, and the explainability of predictions are important in VSR in order to further improve the model design and handling of real-world data. This paper highlights various deep learning techniques for visual speech recognition and reports on experiments using feature visualisation techniques for these models, successfully demonstrating that CNNs are self-learning features consistent with what we would expect.

Original languageEnglish
Title of host publication2019 4th IEEE International Conference on Big Data Analytics, ICBDA 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages231-235
Number of pages5
ISBN (Electronic)9781728112824
DOIs
Publication statusPublished - 10 May 2019
Event4th IEEE International Conference on Big Data Analytics, ICBDA 2019 - Suzhou, China
Duration: 15 Mar 201918 Mar 2019

Publication series

Name2019 4th IEEE International Conference on Big Data Analytics, ICBDA 2019

Conference

Conference4th IEEE International Conference on Big Data Analytics, ICBDA 2019
Country/TerritoryChina
CitySuzhou
Period15/03/1918/03/19

Keywords

  • Deep neural networks
  • artificial intelligence
  • feature engineering
  • model interpretability
  • saliency map
  • visual speech recognition

Fingerprint

Dive into the research topics of 'Using Feature Visualisation for Explaining Deep Learning Models in Visual Speech'. Together they form a unique fingerprint.

Cite this