Publication Details
Ercheng Pei, Yong Zhao, Meshia Oveneke, Meshia Oveneke, Dongmei Jiang, Hichem Sahli

IEEE Transactions on Multimedia

Contribution To Journal


Continuous affective state estimation from facial information is a task which requires the prediction of time series of emotional state outputs from a facial image sequence. Modeling the spatial-temporal evolution of facial information plays an important role in affective state estimation. One of the most widely used methods is Recurrent Neural Networks (RNN). RNNs provide an attractive framework for propagating information over a sequence using a continuous-valued hidden layer representation. In this work, we propose to instead learn rich affective state dynamics. We model human affect as a dynamical system and define the affective state in terms of valence, arousal and their higher-order derivatives. We then pose the affective state estimation problem as a jointly trained state estimator for high-dimensional input images, combining an RNN and a Bayesian Filter, i.e. Kalman filters (KF) and Extended Kalman filters (EKF), so that all weights in the resulting network can be trained using backpropagation. We use a recently proposed general framework for designing and learning discriminative state estimators framed as computational graphs. Such approach can handle high dimensional observations and efficiently optimize, in an end-to-end fashion, the state estimator. In addition, to deal with the asynchrony between emotion labels and input images, caused by the inherent reaction lag of the annotators, we introduce a convolutional layer that aligns features with emotion labels. Experimental results, on the RECOLA and SEMAINE datasets for continuous emotion prediction, illustrate the potential of the proposed framework compared to recent state-of-the-art models.

DOI scopus