A new multi-purpose audio-visual UNMC-VIER database with multiple variabilities

Yee Wan Wong; Sue Inn Ch'Ng; Kah Phooi Seng; Li Minn Ang; Siew Wen Chin; Wei Jen Chew; King Hann Lim

doi:10.1016/j.patrec.2011.06.011

A new multi-purpose audio-visual UNMC-VIER database with multiple variabilities

Yee Wan Wong^*, Sue Inn Ch'Ng, Kah Phooi Seng, Li Minn Ang, Siew Wen Chin, Wei Jen Chew, King Hann Lim

^*Corresponding author for this work

Materials and Manufacturing Engineering

Research output: Contribution to journal › Article › peer-review

13 Citations (Scopus)

Abstract

Audio-visual recognition system is becoming popular because it overcomes certain problems of traditional audio-only recognition system. However, difficulties due to visual variations in video sequence can significantly degrade the recognition performance of the system. This problem can be further complicated when more than one visual variation happen at the same time. Although several databases have been created in this area, none of them includes realistic visual variations in video sequence. With the aim to facilitate the development of robust audio-visual recognition systems, the new audio-visual UNMC-VIER database is created. This database contains various visual variations including illumination, facial expression, head pose, and image resolution variations. The most unique aspect of this database is that it includes more than one visual variation in the same video recording. For the audio part, the utterances are spoken in slow and normal speech pace to improve the learning process of audio-visual speech recognition system. Hence, this database is useful for the development of robust audio-visual person, speech recognition and face recognition systems.

Original language	English
Pages (from-to)	1503-1510
Number of pages	8
Journal	Pattern Recognition Letters
Volume	32
Issue number	13
DOIs	https://doi.org/10.1016/j.patrec.2011.06.011
Publication status	Published - 1 Oct 2011
Externally published	Yes

Keywords

Audio-visual database
Face recognition
Speech recognition
Visual variation

Access to Document

10.1016/j.patrec.2011.06.011

Cite this

@article{4b49babea21047828c57fcd734386b73,

title = "A new multi-purpose audio-visual UNMC-VIER database with multiple variabilities",

abstract = "Audio-visual recognition system is becoming popular because it overcomes certain problems of traditional audio-only recognition system. However, difficulties due to visual variations in video sequence can significantly degrade the recognition performance of the system. This problem can be further complicated when more than one visual variation happen at the same time. Although several databases have been created in this area, none of them includes realistic visual variations in video sequence. With the aim to facilitate the development of robust audio-visual recognition systems, the new audio-visual UNMC-VIER database is created. This database contains various visual variations including illumination, facial expression, head pose, and image resolution variations. The most unique aspect of this database is that it includes more than one visual variation in the same video recording. For the audio part, the utterances are spoken in slow and normal speech pace to improve the learning process of audio-visual speech recognition system. Hence, this database is useful for the development of robust audio-visual person, speech recognition and face recognition systems.",

keywords = "Audio-visual database, Face recognition, Speech recognition, Visual variation",

author = "Wong, {Yee Wan} and Ch'Ng, {Sue Inn} and Seng, {Kah Phooi} and Ang, {Li Minn} and Chin, {Siew Wen} and Chew, {Wei Jen} and Lim, {King Hann}",

year = "2011",

month = oct,

day = "1",

doi = "10.1016/j.patrec.2011.06.011",

language = "English",

volume = "32",

pages = "1503--1510",

journal = "Pattern Recognition Letters",

issn = "0167-8655",

number = "13",

}

TY - JOUR

T1 - A new multi-purpose audio-visual UNMC-VIER database with multiple variabilities

AU - Wong, Yee Wan

AU - Ch'Ng, Sue Inn

AU - Seng, Kah Phooi

AU - Ang, Li Minn

AU - Chin, Siew Wen

AU - Chew, Wei Jen

AU - Lim, King Hann

PY - 2011/10/1

Y1 - 2011/10/1

N2 - Audio-visual recognition system is becoming popular because it overcomes certain problems of traditional audio-only recognition system. However, difficulties due to visual variations in video sequence can significantly degrade the recognition performance of the system. This problem can be further complicated when more than one visual variation happen at the same time. Although several databases have been created in this area, none of them includes realistic visual variations in video sequence. With the aim to facilitate the development of robust audio-visual recognition systems, the new audio-visual UNMC-VIER database is created. This database contains various visual variations including illumination, facial expression, head pose, and image resolution variations. The most unique aspect of this database is that it includes more than one visual variation in the same video recording. For the audio part, the utterances are spoken in slow and normal speech pace to improve the learning process of audio-visual speech recognition system. Hence, this database is useful for the development of robust audio-visual person, speech recognition and face recognition systems.

AB - Audio-visual recognition system is becoming popular because it overcomes certain problems of traditional audio-only recognition system. However, difficulties due to visual variations in video sequence can significantly degrade the recognition performance of the system. This problem can be further complicated when more than one visual variation happen at the same time. Although several databases have been created in this area, none of them includes realistic visual variations in video sequence. With the aim to facilitate the development of robust audio-visual recognition systems, the new audio-visual UNMC-VIER database is created. This database contains various visual variations including illumination, facial expression, head pose, and image resolution variations. The most unique aspect of this database is that it includes more than one visual variation in the same video recording. For the audio part, the utterances are spoken in slow and normal speech pace to improve the learning process of audio-visual speech recognition system. Hence, this database is useful for the development of robust audio-visual person, speech recognition and face recognition systems.

KW - Audio-visual database

KW - Face recognition

KW - Speech recognition

KW - Visual variation

UR - http://www.scopus.com/inward/record.url?scp=79960251538&partnerID=8YFLogxK

U2 - 10.1016/j.patrec.2011.06.011

DO - 10.1016/j.patrec.2011.06.011

M3 - Article

AN - SCOPUS:79960251538

SN - 0167-8655

VL - 32

SP - 1503

EP - 1510

JO - Pattern Recognition Letters

JF - Pattern Recognition Letters

IS - 13

ER -

A new multi-purpose audio-visual UNMC-VIER database with multiple variabilities

Abstract

Keywords

Access to Document

Other files and links

Cite this