TY - GEN
T1 - Gabor based lipreading with a new audiovisual mandarin corpus
AU - Xu, Yan
AU - Li, Yuexuan
AU - Abel, Andrew
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2020.
PY - 2020
Y1 - 2020
N2 - Human speech processing is a multimodal and cognitive activity, with visual information playing a role. Many lipreading systems use English speech data, however, Chinese is the most spoken language in the world and is of increasing interest, as well as the development of lightweight feature extraction to improve learning time. This paper presents an improved character-level Gabor-based lip reading system, using visual information for feature extraction and speech classification. We evaluate this system with a new Audiovisual Mandarin Chinese (AVMC) database composed of 4704 characters spoken by 10 volunteers. The Gabor-based lipreading system has been trained on this dataset, and utilizes the Dlib Region-of-Interest(ROI) method and Gabor filtering to extract lip features, which provides a fast and lightweight approach without any mouth modelling. A character-level Convolutional Neural Network (CNN) is used to recognize Pinyin, with 64.96% accuracy, and a Character Error Rate (CER) of 57.71%.
AB - Human speech processing is a multimodal and cognitive activity, with visual information playing a role. Many lipreading systems use English speech data, however, Chinese is the most spoken language in the world and is of increasing interest, as well as the development of lightweight feature extraction to improve learning time. This paper presents an improved character-level Gabor-based lip reading system, using visual information for feature extraction and speech classification. We evaluate this system with a new Audiovisual Mandarin Chinese (AVMC) database composed of 4704 characters spoken by 10 volunteers. The Gabor-based lipreading system has been trained on this dataset, and utilizes the Dlib Region-of-Interest(ROI) method and Gabor filtering to extract lip features, which provides a fast and lightweight approach without any mouth modelling. A character-level Convolutional Neural Network (CNN) is used to recognize Pinyin, with 64.96% accuracy, and a Character Error Rate (CER) of 57.71%.
KW - Audiovisual
KW - Chinese
KW - Gabor transform
KW - Speech recognition
UR - http://www.scopus.com/inward/record.url?scp=85080922074&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-39431-8_16
DO - 10.1007/978-3-030-39431-8_16
M3 - Conference Proceeding
AN - SCOPUS:85080922074
SN - 9783030394301
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 169
EP - 179
BT - Advances in Brain Inspired Cognitive Systems - 10th International Conference, BICS 2019, Proceedings
A2 - Ren, Jinchang
A2 - Hussain, Amir
A2 - Zhao, Huimin
A2 - Cai, Jun
A2 - Chen, Rongjun
A2 - Xiao, Yinyin
A2 - Huang, Kaizhu
A2 - Zheng, Jiangbin
PB - Springer
T2 - 10th International Conference on Brain Inspired Cognitive Systems, BICS 2019
Y2 - 13 July 2019 through 14 July 2019
ER -