Improved voice activity detection for speech recognition system

Siew Wen Chin; Kah Phooi Seng; Li Minn Ang; King Hann Lim

doi:10.1109/COMPSYM.2010.5685456

Improved voice activity detection for speech recognition system

Siew Wen Chin, Kah Phooi Seng, Li Minn Ang, King Hann Lim

Materials and Manufacturing Engineering

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

6 Citations (Scopus)

Abstract

An improved voice activity detection (VAD) based on the radial basis function neural network (RBF NN) and continuous wavelet transform (CWT) for speech recognition system is presented in the paper. The input speech signal is analyzed in the form of fixed size window by using Mel-frequency cepstral coefficients (MFCC). Within the windowed signal, the proposed RBF-CWT VAD algorithm detects the speech/ non-speech signal using the RBF NN. Once the interchange of speech to non-speech or vice versa occurred, the energy changes of the CWT coefficients are calculated to localize the final coordination of the starting/ending speech points. Instead of classifying the speech signal using the MFCC at the frame-level which easily capture lots of undesired noise encountered by the conventional VAD with the binary classifier, the proposed RBF NN with the aid of CWT analyzes the transformation of the MFCC at the window-level that offers a better compensation to the noisy signal. The simulation results shows an improvement on the precision of the speech detection and the overall ASR rate particularly under the noisy circumstances compared to the conventional VAD with the zero-crossing rate, short-term signal energy and binary classifier.

Original language	English
Title of host publication	ICS 2010 - International Computer Symposium
Pages	518-523
Number of pages	6
DOIs	https://doi.org/10.1109/COMPSYM.2010.5685456
Publication status	Published - 2010
Externally published	Yes
Event	2010 International Computer Symposium, ICS 2010 - Tainan, Taiwan, Province of China Duration: 16 Dec 2010 → 18 Dec 2010

Publication series

Name	ICS 2010 - International Computer Symposium

Conference

Conference	2010 International Computer Symposium, ICS 2010
Country/Territory	Taiwan, Province of China
City	Tainan
Period	16/12/10 → 18/12/10

Keywords

Continuous wavelet transform
Mel frequency cepstral coefficient
Radial basis function
Voice activity detection

Access to Document

10.1109/COMPSYM.2010.5685456

Cite this

@inproceedings{ed7d01adf86c457682e53a9c48a292c0,

title = "Improved voice activity detection for speech recognition system",

abstract = "An improved voice activity detection (VAD) based on the radial basis function neural network (RBF NN) and continuous wavelet transform (CWT) for speech recognition system is presented in the paper. The input speech signal is analyzed in the form of fixed size window by using Mel-frequency cepstral coefficients (MFCC). Within the windowed signal, the proposed RBF-CWT VAD algorithm detects the speech/ non-speech signal using the RBF NN. Once the interchange of speech to non-speech or vice versa occurred, the energy changes of the CWT coefficients are calculated to localize the final coordination of the starting/ending speech points. Instead of classifying the speech signal using the MFCC at the frame-level which easily capture lots of undesired noise encountered by the conventional VAD with the binary classifier, the proposed RBF NN with the aid of CWT analyzes the transformation of the MFCC at the window-level that offers a better compensation to the noisy signal. The simulation results shows an improvement on the precision of the speech detection and the overall ASR rate particularly under the noisy circumstances compared to the conventional VAD with the zero-crossing rate, short-term signal energy and binary classifier.",

keywords = "Continuous wavelet transform, Mel frequency cepstral coefficient, Radial basis function, Voice activity detection",

author = "Chin, {Siew Wen} and Seng, {Kah Phooi} and Ang, {Li Minn} and Lim, {King Hann}",

year = "2010",

doi = "10.1109/COMPSYM.2010.5685456",

language = "English",

isbn = "9781424476404",

series = "ICS 2010 - International Computer Symposium",

pages = "518--523",

booktitle = "ICS 2010 - International Computer Symposium",

note = "2010 International Computer Symposium, ICS 2010 ; Conference date: 16-12-2010 Through 18-12-2010",

}

TY - GEN

T1 - Improved voice activity detection for speech recognition system

AU - Chin, Siew Wen

AU - Seng, Kah Phooi

AU - Ang, Li Minn

AU - Lim, King Hann

PY - 2010

Y1 - 2010

N2 - An improved voice activity detection (VAD) based on the radial basis function neural network (RBF NN) and continuous wavelet transform (CWT) for speech recognition system is presented in the paper. The input speech signal is analyzed in the form of fixed size window by using Mel-frequency cepstral coefficients (MFCC). Within the windowed signal, the proposed RBF-CWT VAD algorithm detects the speech/ non-speech signal using the RBF NN. Once the interchange of speech to non-speech or vice versa occurred, the energy changes of the CWT coefficients are calculated to localize the final coordination of the starting/ending speech points. Instead of classifying the speech signal using the MFCC at the frame-level which easily capture lots of undesired noise encountered by the conventional VAD with the binary classifier, the proposed RBF NN with the aid of CWT analyzes the transformation of the MFCC at the window-level that offers a better compensation to the noisy signal. The simulation results shows an improvement on the precision of the speech detection and the overall ASR rate particularly under the noisy circumstances compared to the conventional VAD with the zero-crossing rate, short-term signal energy and binary classifier.

AB - An improved voice activity detection (VAD) based on the radial basis function neural network (RBF NN) and continuous wavelet transform (CWT) for speech recognition system is presented in the paper. The input speech signal is analyzed in the form of fixed size window by using Mel-frequency cepstral coefficients (MFCC). Within the windowed signal, the proposed RBF-CWT VAD algorithm detects the speech/ non-speech signal using the RBF NN. Once the interchange of speech to non-speech or vice versa occurred, the energy changes of the CWT coefficients are calculated to localize the final coordination of the starting/ending speech points. Instead of classifying the speech signal using the MFCC at the frame-level which easily capture lots of undesired noise encountered by the conventional VAD with the binary classifier, the proposed RBF NN with the aid of CWT analyzes the transformation of the MFCC at the window-level that offers a better compensation to the noisy signal. The simulation results shows an improvement on the precision of the speech detection and the overall ASR rate particularly under the noisy circumstances compared to the conventional VAD with the zero-crossing rate, short-term signal energy and binary classifier.

KW - Continuous wavelet transform

KW - Mel frequency cepstral coefficient

KW - Radial basis function

KW - Voice activity detection

UR - http://www.scopus.com/inward/record.url?scp=79851483774&partnerID=8YFLogxK

U2 - 10.1109/COMPSYM.2010.5685456

DO - 10.1109/COMPSYM.2010.5685456

M3 - Conference Proceeding

AN - SCOPUS:79851483774

SN - 9781424476404

T3 - ICS 2010 - International Computer Symposium

SP - 518

EP - 523

BT - ICS 2010 - International Computer Symposium

T2 - 2010 International Computer Symposium, ICS 2010

Y2 - 16 December 2010 through 18 December 2010

ER -

Improved voice activity detection for speech recognition system

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Cite this