Using Feature Visualisation for Explaining Deep Learning Models in Visual Speech

Timothy Israel Santos; Andrew Abel

doi:10.1109/ICBDA.2019.8713256

Using Feature Visualisation for Explaining Deep Learning Models in Visual Speech

Timothy Israel Santos, Andrew Abel

Department of Computing

Xi'an Jiaotong-Liverpool University

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

6 Citations (Scopus)

Abstract

The use of Deep Neural Network (DNN) models for Visual Speech Recognition (VSR) has recently been gaining traction. The use of more complex DNN models have greatly increased accuracy performance but have the downside of very poor explainability. There is still much room for improvement in using DNN models for VSR in comparison to audio-only speech recognition. Being able to explain the model and its predictions would be beneficial for improving its performance, and the explainability of predictions are important in VSR in order to further improve the model design and handling of real-world data. This paper highlights various deep learning techniques for visual speech recognition and reports on experiments using feature visualisation techniques for these models, successfully demonstrating that CNNs are self-learning features consistent with what we would expect.

Original language	English
Title of host publication	2019 4th IEEE International Conference on Big Data Analytics, ICBDA 2019
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	231-235
Number of pages	5
ISBN (Electronic)	9781728112824
DOIs	https://doi.org/10.1109/ICBDA.2019.8713256
Publication status	Published - 10 May 2019
Event	4th IEEE International Conference on Big Data Analytics, ICBDA 2019 - Suzhou, China Duration: 15 Mar 2019 → 18 Mar 2019

Publication series

Name	2019 4th IEEE International Conference on Big Data Analytics, ICBDA 2019

Conference

Conference	4th IEEE International Conference on Big Data Analytics, ICBDA 2019
Country/Territory	China
City	Suzhou
Period	15/03/19 → 18/03/19

Keywords

Deep neural networks
artificial intelligence
feature engineering
model interpretability
saliency map
visual speech recognition

Access to Document

10.1109/ICBDA.2019.8713256

Cite this

Santos, T. I., & Abel, A. (2019). Using Feature Visualisation for Explaining Deep Learning Models in Visual Speech. In 2019 4th IEEE International Conference on Big Data Analytics, ICBDA 2019 (pp. 231-235). Article 8713256 (2019 4th IEEE International Conference on Big Data Analytics, ICBDA 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICBDA.2019.8713256

@inproceedings{a54dd2f578924cf4aafe72339970596d,

title = "Using Feature Visualisation for Explaining Deep Learning Models in Visual Speech",

abstract = "The use of Deep Neural Network (DNN) models for Visual Speech Recognition (VSR) has recently been gaining traction. The use of more complex DNN models have greatly increased accuracy performance but have the downside of very poor explainability. There is still much room for improvement in using DNN models for VSR in comparison to audio-only speech recognition. Being able to explain the model and its predictions would be beneficial for improving its performance, and the explainability of predictions are important in VSR in order to further improve the model design and handling of real-world data. This paper highlights various deep learning techniques for visual speech recognition and reports on experiments using feature visualisation techniques for these models, successfully demonstrating that CNNs are self-learning features consistent with what we would expect.",

keywords = "Deep neural networks, artificial intelligence, feature engineering, model interpretability, saliency map, visual speech recognition",

author = "Santos, {Timothy Israel} and Andrew Abel",

note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 4th IEEE International Conference on Big Data Analytics, ICBDA 2019 ; Conference date: 15-03-2019 Through 18-03-2019",

year = "2019",

month = may,

day = "10",

doi = "10.1109/ICBDA.2019.8713256",

language = "English",

series = "2019 4th IEEE International Conference on Big Data Analytics, ICBDA 2019",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "231--235",

booktitle = "2019 4th IEEE International Conference on Big Data Analytics, ICBDA 2019",

}

Santos, TI & Abel, A 2019, Using Feature Visualisation for Explaining Deep Learning Models in Visual Speech. in 2019 4th IEEE International Conference on Big Data Analytics, ICBDA 2019., 8713256, 2019 4th IEEE International Conference on Big Data Analytics, ICBDA 2019, Institute of Electrical and Electronics Engineers Inc., pp. 231-235, 4th IEEE International Conference on Big Data Analytics, ICBDA 2019, Suzhou, China, 15/03/19. https://doi.org/10.1109/ICBDA.2019.8713256

Using Feature Visualisation for Explaining Deep Learning Models in Visual Speech. / Santos, Timothy Israel; Abel, Andrew.
2019 4th IEEE International Conference on Big Data Analytics, ICBDA 2019. Institute of Electrical and Electronics Engineers Inc., 2019. p. 231-235 8713256 (2019 4th IEEE International Conference on Big Data Analytics, ICBDA 2019).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Using Feature Visualisation for Explaining Deep Learning Models in Visual Speech

AU - Santos, Timothy Israel

AU - Abel, Andrew

PY - 2019/5/10

Y1 - 2019/5/10

N2 - The use of Deep Neural Network (DNN) models for Visual Speech Recognition (VSR) has recently been gaining traction. The use of more complex DNN models have greatly increased accuracy performance but have the downside of very poor explainability. There is still much room for improvement in using DNN models for VSR in comparison to audio-only speech recognition. Being able to explain the model and its predictions would be beneficial for improving its performance, and the explainability of predictions are important in VSR in order to further improve the model design and handling of real-world data. This paper highlights various deep learning techniques for visual speech recognition and reports on experiments using feature visualisation techniques for these models, successfully demonstrating that CNNs are self-learning features consistent with what we would expect.

AB - The use of Deep Neural Network (DNN) models for Visual Speech Recognition (VSR) has recently been gaining traction. The use of more complex DNN models have greatly increased accuracy performance but have the downside of very poor explainability. There is still much room for improvement in using DNN models for VSR in comparison to audio-only speech recognition. Being able to explain the model and its predictions would be beneficial for improving its performance, and the explainability of predictions are important in VSR in order to further improve the model design and handling of real-world data. This paper highlights various deep learning techniques for visual speech recognition and reports on experiments using feature visualisation techniques for these models, successfully demonstrating that CNNs are self-learning features consistent with what we would expect.

KW - Deep neural networks

KW - artificial intelligence

KW - feature engineering

KW - model interpretability

KW - saliency map

KW - visual speech recognition

UR - http://www.scopus.com/inward/record.url?scp=85066615368&partnerID=8YFLogxK

U2 - 10.1109/ICBDA.2019.8713256

DO - 10.1109/ICBDA.2019.8713256

M3 - Conference Proceeding

AN - SCOPUS:85066615368

T3 - 2019 4th IEEE International Conference on Big Data Analytics, ICBDA 2019

SP - 231

EP - 235

BT - 2019 4th IEEE International Conference on Big Data Analytics, ICBDA 2019

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 4th IEEE International Conference on Big Data Analytics, ICBDA 2019

Y2 - 15 March 2019 through 18 March 2019

ER -

Using Feature Visualisation for Explaining Deep Learning Models in Visual Speech

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this