Gabor based lipreading with a new audiovisual mandarin corpus

Yan Xu; Yuexuan Li; Andrew Abel

doi:10.1007/978-3-030-39431-8_16

Gabor based lipreading with a new audiovisual mandarin corpus

Yan Xu, Yuexuan Li, Andrew Abel^*

^*Corresponding author for this work

Department of Computing

Xi'an Jiaotong-Liverpool University

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

5 Citations (Scopus)

Abstract

Human speech processing is a multimodal and cognitive activity, with visual information playing a role. Many lipreading systems use English speech data, however, Chinese is the most spoken language in the world and is of increasing interest, as well as the development of lightweight feature extraction to improve learning time. This paper presents an improved character-level Gabor-based lip reading system, using visual information for feature extraction and speech classification. We evaluate this system with a new Audiovisual Mandarin Chinese (AVMC) database composed of 4704 characters spoken by 10 volunteers. The Gabor-based lipreading system has been trained on this dataset, and utilizes the Dlib Region-of-Interest(ROI) method and Gabor filtering to extract lip features, which provides a fast and lightweight approach without any mouth modelling. A character-level Convolutional Neural Network (CNN) is used to recognize Pinyin, with 64.96% accuracy, and a Character Error Rate (CER) of 57.71%.

Original language	English
Title of host publication	Advances in Brain Inspired Cognitive Systems - 10th International Conference, BICS 2019, Proceedings
Editors	Jinchang Ren, Amir Hussain, Huimin Zhao, Jun Cai, Rongjun Chen, Yinyin Xiao, Kaizhu Huang, Jiangbin Zheng
Publisher	Springer
Pages	169-179
Number of pages	11
ISBN (Print)	9783030394301
DOIs	https://doi.org/10.1007/978-3-030-39431-8_16
Publication status	Published - 2020
Event	10th International Conference on Brain Inspired Cognitive Systems, BICS 2019 - Guangzhou, China Duration: 13 Jul 2019 → 14 Jul 2019

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	11691 LNAI
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	10th International Conference on Brain Inspired Cognitive Systems, BICS 2019
Country/Territory	China
City	Guangzhou
Period	13/07/19 → 14/07/19

Keywords

Audiovisual
Chinese
Gabor transform
Speech recognition

Access to Document

10.1007/978-3-030-39431-8_16

Cite this

Xu, Y., Li, Y., & Abel, A. (2020). Gabor based lipreading with a new audiovisual mandarin corpus. In J. Ren, A. Hussain, H. Zhao, J. Cai, R. Chen, Y. Xiao, K. Huang, & J. Zheng (Eds.), Advances in Brain Inspired Cognitive Systems - 10th International Conference, BICS 2019, Proceedings (pp. 169-179). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11691 LNAI). Springer. https://doi.org/10.1007/978-3-030-39431-8_16

Xu, Yan ; Li, Yuexuan ; Abel, Andrew. / Gabor based lipreading with a new audiovisual mandarin corpus. Advances in Brain Inspired Cognitive Systems - 10th International Conference, BICS 2019, Proceedings. editor / Jinchang Ren ; Amir Hussain ; Huimin Zhao ; Jun Cai ; Rongjun Chen ; Yinyin Xiao ; Kaizhu Huang ; Jiangbin Zheng. Springer, 2020. pp. 169-179 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{27770e52c77149fa806811cbbcaa85f6,

title = "Gabor based lipreading with a new audiovisual mandarin corpus",

abstract = "Human speech processing is a multimodal and cognitive activity, with visual information playing a role. Many lipreading systems use English speech data, however, Chinese is the most spoken language in the world and is of increasing interest, as well as the development of lightweight feature extraction to improve learning time. This paper presents an improved character-level Gabor-based lip reading system, using visual information for feature extraction and speech classification. We evaluate this system with a new Audiovisual Mandarin Chinese (AVMC) database composed of 4704 characters spoken by 10 volunteers. The Gabor-based lipreading system has been trained on this dataset, and utilizes the Dlib Region-of-Interest(ROI) method and Gabor filtering to extract lip features, which provides a fast and lightweight approach without any mouth modelling. A character-level Convolutional Neural Network (CNN) is used to recognize Pinyin, with 64.96% accuracy, and a Character Error Rate (CER) of 57.71%.",

keywords = "Audiovisual, Chinese, Gabor transform, Speech recognition",

author = "Yan Xu and Yuexuan Li and Andrew Abel",

note = "Publisher Copyright: {\textcopyright} Springer Nature Switzerland AG 2020.; 10th International Conference on Brain Inspired Cognitive Systems, BICS 2019 ; Conference date: 13-07-2019 Through 14-07-2019",

year = "2020",

doi = "10.1007/978-3-030-39431-8_16",

language = "English",

isbn = "9783030394301",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer",

pages = "169--179",

editor = "Jinchang Ren and Amir Hussain and Huimin Zhao and Jun Cai and Rongjun Chen and Yinyin Xiao and Kaizhu Huang and Jiangbin Zheng",

booktitle = "Advances in Brain Inspired Cognitive Systems - 10th International Conference, BICS 2019, Proceedings",

}

Xu, Y, Li, Y & Abel, A 2020, Gabor based lipreading with a new audiovisual mandarin corpus. in J Ren, A Hussain, H Zhao, J Cai, R Chen, Y Xiao, K Huang & J Zheng (eds), Advances in Brain Inspired Cognitive Systems - 10th International Conference, BICS 2019, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11691 LNAI, Springer, pp. 169-179, 10th International Conference on Brain Inspired Cognitive Systems, BICS 2019, Guangzhou, China, 13/07/19. https://doi.org/10.1007/978-3-030-39431-8_16

Gabor based lipreading with a new audiovisual mandarin corpus. / Xu, Yan; Li, Yuexuan; Abel, Andrew.
Advances in Brain Inspired Cognitive Systems - 10th International Conference, BICS 2019, Proceedings. ed. / Jinchang Ren; Amir Hussain; Huimin Zhao; Jun Cai; Rongjun Chen; Yinyin Xiao; Kaizhu Huang; Jiangbin Zheng. Springer, 2020. p. 169-179 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11691 LNAI).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Gabor based lipreading with a new audiovisual mandarin corpus

AU - Xu, Yan

AU - Li, Yuexuan

AU - Abel, Andrew

N1 - Publisher Copyright: © Springer Nature Switzerland AG 2020.

PY - 2020

Y1 - 2020

N2 - Human speech processing is a multimodal and cognitive activity, with visual information playing a role. Many lipreading systems use English speech data, however, Chinese is the most spoken language in the world and is of increasing interest, as well as the development of lightweight feature extraction to improve learning time. This paper presents an improved character-level Gabor-based lip reading system, using visual information for feature extraction and speech classification. We evaluate this system with a new Audiovisual Mandarin Chinese (AVMC) database composed of 4704 characters spoken by 10 volunteers. The Gabor-based lipreading system has been trained on this dataset, and utilizes the Dlib Region-of-Interest(ROI) method and Gabor filtering to extract lip features, which provides a fast and lightweight approach without any mouth modelling. A character-level Convolutional Neural Network (CNN) is used to recognize Pinyin, with 64.96% accuracy, and a Character Error Rate (CER) of 57.71%.

AB - Human speech processing is a multimodal and cognitive activity, with visual information playing a role. Many lipreading systems use English speech data, however, Chinese is the most spoken language in the world and is of increasing interest, as well as the development of lightweight feature extraction to improve learning time. This paper presents an improved character-level Gabor-based lip reading system, using visual information for feature extraction and speech classification. We evaluate this system with a new Audiovisual Mandarin Chinese (AVMC) database composed of 4704 characters spoken by 10 volunteers. The Gabor-based lipreading system has been trained on this dataset, and utilizes the Dlib Region-of-Interest(ROI) method and Gabor filtering to extract lip features, which provides a fast and lightweight approach without any mouth modelling. A character-level Convolutional Neural Network (CNN) is used to recognize Pinyin, with 64.96% accuracy, and a Character Error Rate (CER) of 57.71%.

KW - Audiovisual

KW - Chinese

KW - Gabor transform

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=85080922074&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-39431-8_16

DO - 10.1007/978-3-030-39431-8_16

M3 - Conference Proceeding

AN - SCOPUS:85080922074

SN - 9783030394301

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 169

EP - 179

BT - Advances in Brain Inspired Cognitive Systems - 10th International Conference, BICS 2019, Proceedings

A2 - Ren, Jinchang

A2 - Hussain, Amir

A2 - Zhao, Huimin

A2 - Cai, Jun

A2 - Chen, Rongjun

A2 - Xiao, Yinyin

A2 - Huang, Kaizhu

A2 - Zheng, Jiangbin

PB - Springer

T2 - 10th International Conference on Brain Inspired Cognitive Systems, BICS 2019

Y2 - 13 July 2019 through 14 July 2019

ER -

Xu Y, Li Y, Abel A. Gabor based lipreading with a new audiovisual mandarin corpus. In Ren J, Hussain A, Zhao H, Cai J, Chen R, Xiao Y, Huang K, Zheng J, editors, Advances in Brain Inspired Cognitive Systems - 10th International Conference, BICS 2019, Proceedings. Springer. 2020. p. 169-179. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-39431-8_16

Gabor based lipreading with a new audiovisual mandarin corpus

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this