Publication Details
Ercheng Pei, Meshia Oveneke, Yong Zhao, Dongmei Jiang, Hichem Sahli

IEEE Transactions on Multimedia

Contribution To Journal


Automated facial expression analysis from image sequences for continuous emotion recognition is a very challenging task due to the loss of the three-dimensional information during the image formation process. State-of-the-art relied on estimating dynamic textures features and convolutional neural network features to derive spatio-temporal features. Despite their great success, such features are insensitive to micro facial muscle deformations and are affected by identity, face pose, illumination variation, and self-occlusion. In this work, we argue that retrieving, from image sequences, 3D facial spatio-temporal information, which describes the natural facial muscle deformation, provides a semantical and efficient way of representation and is useful for emotion recognition. In this paper, we propose a framework for extracting three-dimensional facial spatio-temporal features from monocular image sequences using an extended 3D Morphable Model (3DMM) which disentangles the identity factor from the facial expressions of a specific person. An LSTM model is used to evaluate the effectiveness of the proposed spatio-temporal features on video-based facial expression recognition task and continuous affect recognition task. Experimental results, on the AFEW6.0 datasets for facial expression recognition, and the RECOLA and SEMAINE datasets for continuous emotion prediction, illustrate the potential of the proposed 3D spatio-temporal features for facial expressions analysis and continuous affect recognition, as well as their efficiency compared to recent state-of-the-art features.