A new approach of audio emotion recognition

Chien Shing Ooi; Kah Phooi Seng; Li Minn Ang; Li Wern Chew

doi:10.1016/j.eswa.2014.03.026

A new approach of audio emotion recognition

Chien Shing Ooi^*, Kah Phooi Seng, Li Minn Ang, Li Wern Chew

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

105 Citations (Scopus)

Abstract

A new architecture of intelligent audio emotion recognition is proposed in this paper. It fully utilizes both prosodic and spectral features in its design. It has two main paths in parallel and can recognize 6 emotions. Path 1 is designed based on intensive analysis of different prosodic features. Significant prosodic features are identified to differentiate emotions. Path 2 is designed based on research analysis on spectral features. Extraction of Mel-Frequency Cepstral Coefficient (MFCC) feature is then followed by Bi-directional Principle Component Analysis (BDPCA), Linear Discriminant Analysis (LDA) and Radial Basis Function (RBF) neural classification. This path has 3 parallel BDPCA + LDA + RBF sub-paths structure and each handles two emotions. Fusion modules are also proposed for weights assignment and decision making. The performance of the proposed architecture is evaluated on eNTERFACE'05 and RML databases. Simulation results and comparison have revealed good performance of the proposed recognizer.

Original language	English
Pages (from-to)	5858-5869
Number of pages	12
Journal	Expert Systems with Applications
Volume	41
Issue number	13
DOIs	https://doi.org/10.1016/j.eswa.2014.03.026
Publication status	Published - 1 Oct 2014
Externally published	Yes

Keywords

Audio emotion recognition
MFCC feature
Prosodic features
RBF neural network

Access to Document

10.1016/j.eswa.2014.03.026

Cite this

@article{00d1826a101a44549177853767ba42a4,

title = "A new approach of audio emotion recognition",

abstract = "A new architecture of intelligent audio emotion recognition is proposed in this paper. It fully utilizes both prosodic and spectral features in its design. It has two main paths in parallel and can recognize 6 emotions. Path 1 is designed based on intensive analysis of different prosodic features. Significant prosodic features are identified to differentiate emotions. Path 2 is designed based on research analysis on spectral features. Extraction of Mel-Frequency Cepstral Coefficient (MFCC) feature is then followed by Bi-directional Principle Component Analysis (BDPCA), Linear Discriminant Analysis (LDA) and Radial Basis Function (RBF) neural classification. This path has 3 parallel BDPCA + LDA + RBF sub-paths structure and each handles two emotions. Fusion modules are also proposed for weights assignment and decision making. The performance of the proposed architecture is evaluated on eNTERFACE'05 and RML databases. Simulation results and comparison have revealed good performance of the proposed recognizer.",

keywords = "Audio emotion recognition, MFCC feature, Prosodic features, RBF neural network",

author = "Ooi, {Chien Shing} and Seng, {Kah Phooi} and Ang, {Li Minn} and Chew, {Li Wern}",

note = "Funding Information: The authors would like to thank the OCP foundation for the financial support through the doctoral program from Mohammed VI Polytechnic University.",

year = "2014",

month = oct,

day = "1",

doi = "10.1016/j.eswa.2014.03.026",

language = "English",

volume = "41",

pages = "5858--5869",

journal = "Expert Systems with Applications",

issn = "0957-4174",

publisher = "Elsevier",

number = "13",

}

TY - JOUR

T1 - A new approach of audio emotion recognition

AU - Ooi, Chien Shing

AU - Seng, Kah Phooi

AU - Ang, Li Minn

AU - Chew, Li Wern

N1 - Funding Information: The authors would like to thank the OCP foundation for the financial support through the doctoral program from Mohammed VI Polytechnic University.

PY - 2014/10/1

Y1 - 2014/10/1

N2 - A new architecture of intelligent audio emotion recognition is proposed in this paper. It fully utilizes both prosodic and spectral features in its design. It has two main paths in parallel and can recognize 6 emotions. Path 1 is designed based on intensive analysis of different prosodic features. Significant prosodic features are identified to differentiate emotions. Path 2 is designed based on research analysis on spectral features. Extraction of Mel-Frequency Cepstral Coefficient (MFCC) feature is then followed by Bi-directional Principle Component Analysis (BDPCA), Linear Discriminant Analysis (LDA) and Radial Basis Function (RBF) neural classification. This path has 3 parallel BDPCA + LDA + RBF sub-paths structure and each handles two emotions. Fusion modules are also proposed for weights assignment and decision making. The performance of the proposed architecture is evaluated on eNTERFACE'05 and RML databases. Simulation results and comparison have revealed good performance of the proposed recognizer.

AB - A new architecture of intelligent audio emotion recognition is proposed in this paper. It fully utilizes both prosodic and spectral features in its design. It has two main paths in parallel and can recognize 6 emotions. Path 1 is designed based on intensive analysis of different prosodic features. Significant prosodic features are identified to differentiate emotions. Path 2 is designed based on research analysis on spectral features. Extraction of Mel-Frequency Cepstral Coefficient (MFCC) feature is then followed by Bi-directional Principle Component Analysis (BDPCA), Linear Discriminant Analysis (LDA) and Radial Basis Function (RBF) neural classification. This path has 3 parallel BDPCA + LDA + RBF sub-paths structure and each handles two emotions. Fusion modules are also proposed for weights assignment and decision making. The performance of the proposed architecture is evaluated on eNTERFACE'05 and RML databases. Simulation results and comparison have revealed good performance of the proposed recognizer.

KW - Audio emotion recognition

KW - MFCC feature

KW - Prosodic features

KW - RBF neural network

UR - http://www.scopus.com/inward/record.url?scp=84899710850&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2014.03.026

DO - 10.1016/j.eswa.2014.03.026

M3 - Article

AN - SCOPUS:84899710850

SN - 0957-4174

VL - 41

SP - 5858

EP - 5869

JO - Expert Systems with Applications

JF - Expert Systems with Applications

IS - 13

ER -

A new approach of audio emotion recognition

Abstract

Keywords

Access to Document

Other files and links

Cite this