Audio and Visual Speech Recognition Recent Trends

Lee Hao Wei; Seng Kah Phooi; Ang Li Minn

doi:10.4018/978-1-4666-3958-4.ch002

Audio and Visual Speech Recognition Recent Trends

Lee Hao Wei^*, Seng Kah Phooi, Ang Li Minn

^*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceeding › Chapter › peer-review

1 Citation (Scopus)

Abstract

This chapter focuses on a brief introduction on the origins of the audio-visual speech recognition process and relevant techniques often used by researchers in the field. Brief background theory regarding commonly used methods for feature extraction and classification for both audio and visual processing are discussed with highlights pertaining to Mel-Frequency Cepstral Coefficient, and contour/geometric based lips feature extraction with corresponding tracking methods (Yingjie, Haiyan, Yingjie, & Jinyang, 2011; Liu & Cheung, 2011). Proposed solution concepts will include time derivatives of mel-frequency cepstral coefficients for audio feature extraction, Chroma-colour-based (YCbCr) Face segmentation, Feature Point extraction, Localized Active Contour tracking algorithm, and Hidden Markov Models with Vitebri algorithm incorporated. Information contained in this chapter focuses on being informative for novice speech processing candidates but insufficient mastery knowledge. Additional suggested reading materials should assist in expediting field mastery.

Original language	English
Title of host publication	Intelligent Image and Video Interpretation
Subtitle of host publication	Algorithms and Applications
Publisher	IGI Global
Pages	42-86
Number of pages	45
ISBN (Electronic)	9781466639591
ISBN (Print)	146663958X, 9781466639614
DOIs	https://doi.org/10.4018/978-1-4666-3958-4.ch002
Publication status	Published - 30 Apr 2013
Externally published	Yes

Access to Document

10.4018/978-1-4666-3958-4.ch002

Cite this

@inbook{cc4a303e1dc4478c841ee71606729ecd,

title = "Audio and Visual Speech Recognition Recent Trends",

abstract = "This chapter focuses on a brief introduction on the origins of the audio-visual speech recognition process and relevant techniques often used by researchers in the field. Brief background theory regarding commonly used methods for feature extraction and classification for both audio and visual processing are discussed with highlights pertaining to Mel-Frequency Cepstral Coefficient, and contour/geometric based lips feature extraction with corresponding tracking methods (Yingjie, Haiyan, Yingjie, & Jinyang, 2011; Liu & Cheung, 2011). Proposed solution concepts will include time derivatives of mel-frequency cepstral coefficients for audio feature extraction, Chroma-colour-based (YCbCr) Face segmentation, Feature Point extraction, Localized Active Contour tracking algorithm, and Hidden Markov Models with Vitebri algorithm incorporated. Information contained in this chapter focuses on being informative for novice speech processing candidates but insufficient mastery knowledge. Additional suggested reading materials should assist in expediting field mastery.",

author = "Wei, {Lee Hao} and Phooi, {Seng Kah} and Minn, {Ang Li}",

note = "Publisher Copyright: {\textcopyright} 2013, IGI Global.",

year = "2013",

month = apr,

day = "30",

doi = "10.4018/978-1-4666-3958-4.ch002",

language = "English",

isbn = "146663958X",

pages = "42--86",

booktitle = "Intelligent Image and Video Interpretation",

publisher = "IGI Global",

}

TY - CHAP

T1 - Audio and Visual Speech Recognition Recent Trends

AU - Wei, Lee Hao

AU - Phooi, Seng Kah

AU - Minn, Ang Li

PY - 2013/4/30

Y1 - 2013/4/30

N2 - This chapter focuses on a brief introduction on the origins of the audio-visual speech recognition process and relevant techniques often used by researchers in the field. Brief background theory regarding commonly used methods for feature extraction and classification for both audio and visual processing are discussed with highlights pertaining to Mel-Frequency Cepstral Coefficient, and contour/geometric based lips feature extraction with corresponding tracking methods (Yingjie, Haiyan, Yingjie, & Jinyang, 2011; Liu & Cheung, 2011). Proposed solution concepts will include time derivatives of mel-frequency cepstral coefficients for audio feature extraction, Chroma-colour-based (YCbCr) Face segmentation, Feature Point extraction, Localized Active Contour tracking algorithm, and Hidden Markov Models with Vitebri algorithm incorporated. Information contained in this chapter focuses on being informative for novice speech processing candidates but insufficient mastery knowledge. Additional suggested reading materials should assist in expediting field mastery.

AB - This chapter focuses on a brief introduction on the origins of the audio-visual speech recognition process and relevant techniques often used by researchers in the field. Brief background theory regarding commonly used methods for feature extraction and classification for both audio and visual processing are discussed with highlights pertaining to Mel-Frequency Cepstral Coefficient, and contour/geometric based lips feature extraction with corresponding tracking methods (Yingjie, Haiyan, Yingjie, & Jinyang, 2011; Liu & Cheung, 2011). Proposed solution concepts will include time derivatives of mel-frequency cepstral coefficients for audio feature extraction, Chroma-colour-based (YCbCr) Face segmentation, Feature Point extraction, Localized Active Contour tracking algorithm, and Hidden Markov Models with Vitebri algorithm incorporated. Information contained in this chapter focuses on being informative for novice speech processing candidates but insufficient mastery knowledge. Additional suggested reading materials should assist in expediting field mastery.

UR - http://www.scopus.com/inward/record.url?scp=84944050870&partnerID=8YFLogxK

U2 - 10.4018/978-1-4666-3958-4.ch002

DO - 10.4018/978-1-4666-3958-4.ch002

M3 - Chapter

AN - SCOPUS:84944050870

SN - 146663958X

SN - 9781466639614

SP - 42

EP - 86

BT - Intelligent Image and Video Interpretation

PB - IGI Global

ER -

Audio and Visual Speech Recognition Recent Trends

Abstract

Access to Document

Other files and links

Cite this