Article,

From Hard to Soft

, , , , and .
(2017)
DOI: 10.1145/3123266.3123383

Abstract

Over the last decade, automatic emotion recognition has become well established. The gold standard target is thereby usually calcu-lated based on multiple annotations from diierent raters. All related eeorts assume that the emotional state of a human subject can be identiied by a 'hard' category or a unique value. This assumption tries to ease the human observer's subjectivity when observing patterns such as the emotional state of others. However, as the number of annotators cannot be innnite, uncertainty remains in the emotion target even if calculated from several, yet few human annotators. The common procedure to use this same emotion target in the learning process thus inevitably introduces noise in terms of an uncertain learning target. In this light, we propose a 'soft' prediction framework to provide a more human-like and compre-hensive prediction of emotion. In our novel framework, we provide an additional target to indicate the uncertainty of human percep-tion based on the inter-rater disagreement level, in contrast to the traditional framework which is merely producing one single pre-diction (category or value). To exploit the dependency between the emotional state and the newly introduced perception uncertainty, we implement a multi-task learning strategy. To evaluate the feasi-bility and eeectiveness of the proposed soft prediction framework, we perform extensive experiments on a time-and value-continuous spontaneous audiovisual emotion database including late fusion results. We show that the soft prediction framework with multi-task learning of the emotional state and its perception uncertainty signiicantly outperforms the individual tasks in both the arousal and valence dimensions.

Tags

Users

  • @ghagerer

Comments and Reviews