Feature Augmenting Networks for Improving Depression Severity Estimation from Speech Signals
This publication appears in: IEEE Access
Authors: L. Yang, D. Jiang and H. Sahli
Publication Year: 2020
Depression disorder has become one of the major psychological diseases endangering human health. Researcher in the affective computing community is supporting the development of reliable depression severity estimation system, from multiple modalities (speech, face, text), to assist doctors in their diagnosis. However, the limited amount of annotated data has become the main bottleneck restricting the study on depression screening, especially when deep learning models are used. To alleviate this issue, in this work we propose to use Deep Convolutional Generative Adversarial Network (DCGAN) for features augmentation to improve depression severity estimation from speech. To the best of our knowledge, this approach is the first attempt to apply the Generative Adversarial Network for depression severity estimation from speech. Besides, to measure the quality of the augmented features, we propose three different measurement criteria, characterizing the spatial, frequency and representation learning of the augmented features. Finally, the augmented features are used to train depression estimation models. Experiments are carried out on speech signals from the Audio Visual Emotion Challenge (AVEC2016) depression dataset, and the relationship between the model performance and data size is explored. Our experimental results show that: 1) The combination of the three proposed evaluation criteria can effectively and comprehensively evaluate the quality of the augmented features. 2) When increasing the size of the augmented data, the performance of depression severity estimation gradually improves and the model converges to a certain stable state. 3) The proposed DCGAN based data augmentation approach effectively improves the performance of depression severity estimation, with the root mean square error (RMSE) reduced to 5.520 and mean absolute error (MAE) reduced to 4.634, which is better than most of the state of the art results on AVEC 2016.