Probabilistic Breadth as an Evaluation Measure of Gaussian Mixture Models used for Acoustic Emotion States
The automatic speech recognition and speech-based emotion recognition is based on statistical learning methods which are usually highly tuned. Using the content and emotional information from spoken utterances this provides the opportunity to generate human-machine communication which achieves characteristics of cognitive systems. The systems or recognisers are based on learning methods which are well-known. But, an interpretation or evaluation of such classifiers is usually challenging. Classifiers identify categorical regions in n-dimensional feature spaces by modelling the observation probability by mixtures of multivariate Gaussian densities. For this, we present an approach which allows a more detailed interpretation of the classifier and provides an insight to the method. Our approach is based on the breadth of the resulting Gaussian model which can be generated from the mixture models given by the classifier. We introduce the method and present first results on the EmoDB corpus using a classifier with seven mixtures per emotion. In this exemplary case the classification performance is 64.48% unweighted average recall over all Leave-One-Speaker-Out tests. Investigating the probability models, we draw first conclusions on the characteristics of the Gaussian mixtures applying the breadth as the only parameter.