| || Temporal Pattern Classification using Kernel Methods for Speech Recognition and Speech Emotion Recognition
Author : Chandra Sekhar, C.;Chandrakala, S.
Source : Defence Science Journal ; Vol:60(4) ; 2010 ; pp 348-363
Subject : 681.3 Computer Science;681.3:355 Computer Applications in Defence;Defence Science Journal
Keywords : Hidden Markov model;support vector machine;string kernel;Gaussian mixture model;score vector;parametric vector;speech recognition and speech emotion recognition
Abstract : There are two paradigms for modelling the varying length temporal data namely, modelling the sequences of feature vectors as in the hidden Markov model-based approaches for speech recognition and modelling the sets of feature vectors as in the Gaussian mixture model (GMM)-based approaches for speech emotion recognition. In this paper, the methods using discrete hidden Markov models (DHMMs) in the kernel feature space and string kernel-based SVM classifier for classification of discretised representation of sequence of feature vectors obtained by clustering and vector quantisation in the kernel feature space are presented. The authors then present continuous density hidden Markov models (CDHMMs) in the explicit kernel feature space that use the continuous valued representation of features extracted from the temporal data. The methods for temporal pattern classification by mapping a varying length sequential pattern to a fixed-length sequential pattern and then using an SVM-based classifier for classification are also presented. The task of recognition of spoken letters in E-set, it is possible to build models that use a discretised representation and string kernel SVM based classification and obtain a classification performance better than that of models using the continuous valued representation is demonstrated. For modelling sets of vectors-based representation of temporal data, two approaches in a hybrid framework namely, the score vector-based approach and the segment modeling based approach are presented. In both approaches, a generative model-based method is used to obtain a fixed length pattern representation for a varying length temporal data and then a discriminative model is used for classification. These two approaches are studied for speech emotion recognition task. The segment modeling based approach gives a better performance than the score vector-based approach and the GMM-based classifiers for speech emotion recognition.