TY - GEN
T1 - Improved voice activity detection for speech recognition system
AU - Chin, Siew Wen
AU - Seng, Kah Phooi
AU - Ang, Li Minn
AU - Lim, King Hann
PY - 2010
Y1 - 2010
N2 - An improved voice activity detection (VAD) based on the radial basis function neural network (RBF NN) and continuous wavelet transform (CWT) for speech recognition system is presented in the paper. The input speech signal is analyzed in the form of fixed size window by using Mel-frequency cepstral coefficients (MFCC). Within the windowed signal, the proposed RBF-CWT VAD algorithm detects the speech/ non-speech signal using the RBF NN. Once the interchange of speech to non-speech or vice versa occurred, the energy changes of the CWT coefficients are calculated to localize the final coordination of the starting/ending speech points. Instead of classifying the speech signal using the MFCC at the frame-level which easily capture lots of undesired noise encountered by the conventional VAD with the binary classifier, the proposed RBF NN with the aid of CWT analyzes the transformation of the MFCC at the window-level that offers a better compensation to the noisy signal. The simulation results shows an improvement on the precision of the speech detection and the overall ASR rate particularly under the noisy circumstances compared to the conventional VAD with the zero-crossing rate, short-term signal energy and binary classifier.
AB - An improved voice activity detection (VAD) based on the radial basis function neural network (RBF NN) and continuous wavelet transform (CWT) for speech recognition system is presented in the paper. The input speech signal is analyzed in the form of fixed size window by using Mel-frequency cepstral coefficients (MFCC). Within the windowed signal, the proposed RBF-CWT VAD algorithm detects the speech/ non-speech signal using the RBF NN. Once the interchange of speech to non-speech or vice versa occurred, the energy changes of the CWT coefficients are calculated to localize the final coordination of the starting/ending speech points. Instead of classifying the speech signal using the MFCC at the frame-level which easily capture lots of undesired noise encountered by the conventional VAD with the binary classifier, the proposed RBF NN with the aid of CWT analyzes the transformation of the MFCC at the window-level that offers a better compensation to the noisy signal. The simulation results shows an improvement on the precision of the speech detection and the overall ASR rate particularly under the noisy circumstances compared to the conventional VAD with the zero-crossing rate, short-term signal energy and binary classifier.
KW - Continuous wavelet transform
KW - Mel frequency cepstral coefficient
KW - Radial basis function
KW - Voice activity detection
UR - http://www.scopus.com/inward/record.url?scp=79851483774&partnerID=8YFLogxK
U2 - 10.1109/COMPSYM.2010.5685456
DO - 10.1109/COMPSYM.2010.5685456
M3 - Conference Proceeding
AN - SCOPUS:79851483774
SN - 9781424476404
T3 - ICS 2010 - International Computer Symposium
SP - 518
EP - 523
BT - ICS 2010 - International Computer Symposium
T2 - 2010 International Computer Symposium, ICS 2010
Y2 - 16 December 2010 through 18 December 2010
ER -