Publication Details
Yan Li, Xiaohan Xia, Dongmei Jiang, Hichem Sahli

ICMI 2020 Companion - Companion Publication of the 2020 International Conference on Multimodal Interaction

Contribution To Book Anthology


Mental health applications are increasingly interested in using audio-visual and physiological measurements to detect the emotional state of a person, where significant researches aim to detect episodic emotional state. The availability of wearable devices and advanced signals is attracting researchers to explore the detection of a continuous sequence of emotion categories, referred to as emotion stream, for understanding mental health. Currently, there are no established databases for experimenting with emotion streams. In this paper, we make two contributions. First, we collect a Multi-modal EMOtion Stream (MEMOS) database in the scenario of social games. Audio-video recordings of the players are made via mobile phones and aligned Electrocardiogram (ECG) signals are collected by wearable sensors. Totally 40 multi-modal sessions have been recorded, each lasting between 25 to 70 minutes. Emotional states with time boundaries are self-reported and annotated by the participants while watching the video recordings. Secondly, we propose a two-step emotional state detection framework to automatically determine the emotion categories with their time boundaries along the video recordings. Experiments on the MEMOS database provide the baseline result for temporal emotional state detection research, with average mean-average-precision (mAP) score as 8.109% on detecting the five emotions (happiness, sadness, anger, surprise, other negative emotions) in videos. It is higher than 5.47% where the emotions are detected by averaging the frame-level confidence scores (obtained by Face++ emotion recognition API) in the segments from a sliding window. We expect that this paper will introduce a novel research problem and provide a database for related research.

DOI scopus