Abstract
The recognition of speech and emotions from speech is based on statistical learning
methods which are usually highly tuned. Using this kind of technology, it is possible to introduce
a human-machine communication and interaction since the machine is able to obtain the content
and emotional information from spoken utterances. This provides the opportunity to generate
machine which achieve characteristics of cognitive systems. The systems or recognisers are based
on learning methods which are well-known. On the other hand, an interpretation or evaluation
of such classifiers is usually challenging. For this, we present an approach which allows a more
detailed interpretion of the classifier and provides an insight to the method. Our approach is
based on the breadth of the resulting Gaussian model which can be generated from the mixture
models given by the classifier. We introduce the method and present first results on the EmoDB
corpus using a simple classifier with seven mixtures per emotion. Despite this is a show case
the classification performance is 64.48\% averaged unweighted average recall. Investigating these
models, we draw first conclusion on the characteristics of the Gaussian model applying the
breadth as the only parameter.
Users
Please
log in to take part in the discussion (add own reviews or comments).