TY - GEN
T1 - Using Feature Visualisation for Explaining Deep Learning Models in Visual Speech
AU - Santos, Timothy Israel
AU - Abel, Andrew
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/5/10
Y1 - 2019/5/10
N2 - The use of Deep Neural Network (DNN) models for Visual Speech Recognition (VSR) has recently been gaining traction. The use of more complex DNN models have greatly increased accuracy performance but have the downside of very poor explainability. There is still much room for improvement in using DNN models for VSR in comparison to audio-only speech recognition. Being able to explain the model and its predictions would be beneficial for improving its performance, and the explainability of predictions are important in VSR in order to further improve the model design and handling of real-world data. This paper highlights various deep learning techniques for visual speech recognition and reports on experiments using feature visualisation techniques for these models, successfully demonstrating that CNNs are self-learning features consistent with what we would expect.
AB - The use of Deep Neural Network (DNN) models for Visual Speech Recognition (VSR) has recently been gaining traction. The use of more complex DNN models have greatly increased accuracy performance but have the downside of very poor explainability. There is still much room for improvement in using DNN models for VSR in comparison to audio-only speech recognition. Being able to explain the model and its predictions would be beneficial for improving its performance, and the explainability of predictions are important in VSR in order to further improve the model design and handling of real-world data. This paper highlights various deep learning techniques for visual speech recognition and reports on experiments using feature visualisation techniques for these models, successfully demonstrating that CNNs are self-learning features consistent with what we would expect.
KW - Deep neural networks
KW - artificial intelligence
KW - feature engineering
KW - model interpretability
KW - saliency map
KW - visual speech recognition
UR - http://www.scopus.com/inward/record.url?scp=85066615368&partnerID=8YFLogxK
U2 - 10.1109/ICBDA.2019.8713256
DO - 10.1109/ICBDA.2019.8713256
M3 - Conference Proceeding
AN - SCOPUS:85066615368
T3 - 2019 4th IEEE International Conference on Big Data Analytics, ICBDA 2019
SP - 231
EP - 235
BT - 2019 4th IEEE International Conference on Big Data Analytics, ICBDA 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 4th IEEE International Conference on Big Data Analytics, ICBDA 2019
Y2 - 15 March 2019 through 18 March 2019
ER -