Deep Neural Network Hidden Markov Models, or DNN-HMMs, are recently very promising acoustic models achieving good speech recognition results over Gaussian mixture model based HMMs (GMM-HMMs). In this paper, for emotion recognition from speech, we investigate DNN-HMMs with restricted Boltzmann Machine (RBM) based unsupervised pre-training, and DNN-HMMs with discriminative pre-training. Emotion recognition experiments are carried out on these two models on the eNTERFACE'05 database and Berlin database, respectively, and results are compared with those from the GMM-HMMs, the shallow-NN-HMMs with two layers, as well as the Multi-layer Perceptrons HMMs (MLP-HMMs). Experimental results show that when the numbers of the hidden layers as well hidden units are properly set, the DNN could extend the labeling ability of GMM-HMM. Among all the models, the DNN-HMMs with discriminative pre-training obtain the best results. For example, for the eNTERFACE'05 database, the recognition accuracy improves 12.22% from the DNN-HMMs with unsupervised pre-training, 11.67% from the GMM-HMMs, 10.56% from the MLP-HMMs, and even 17.22% from the shallow-NN-HMMs, respectively.
Li, L, Zhao, Y, Jiang, D, Zhang, Y, Wang, F, Gonzalez, I, Enescu, V & Sahli, H 2013, Hybrid Deep Neural Network-Hidden Markov Model (DNN-HMM) Based Speech Emotion Recognition. in 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII 2013). International Conference on Affective Computing and Intelligent Interaction and Workshops, IEEE, pp. 312-317, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII 2013), Geneva, Switzerland, 2/09/13.
Li, L., Zhao, Y., Jiang, D., Zhang, Y., Wang, F., Gonzalez, I., Enescu, V., & Sahli, H. (2013). Hybrid Deep Neural Network-Hidden Markov Model (DNN-HMM) Based Speech Emotion Recognition. In 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII 2013) (pp. 312-317). (International Conference on Affective Computing and Intelligent Interaction and Workshops). IEEE.
@inproceedings{1c7b54e7c46449a998b3fa9529ebdff1,
title = "Hybrid Deep Neural Network-Hidden Markov Model (DNN-HMM) Based Speech Emotion Recognition",
abstract = "Deep Neural Network Hidden Markov Models, or DNN-HMMs, are recently very promising acoustic models achieving good speech recognition results over Gaussian mixture model based HMMs (GMM-HMMs). In this paper, for emotion recognition from speech, we investigate DNN-HMMs with restricted Boltzmann Machine (RBM) based unsupervised pre-training, and DNN-HMMs with discriminative pre-training. Emotion recognition experiments are carried out on these two models on the eNTERFACE'05 database and Berlin database, respectively, and results are compared with those from the GMM-HMMs, the shallow-NN-HMMs with two layers, as well as the Multi-layer Perceptrons HMMs (MLP-HMMs). Experimental results show that when the numbers of the hidden layers as well hidden units are properly set, the DNN could extend the labeling ability of GMM-HMM. Among all the models, the DNN-HMMs with discriminative pre-training obtain the best results. For example, for the eNTERFACE'05 database, the recognition accuracy improves 12.22% from the DNN-HMMs with unsupervised pre-training, 11.67% from the GMM-HMMs, 10.56% from the MLP-HMMs, and even 17.22% from the shallow-NN-HMMs, respectively.",
keywords = "emotion recognition, machine learning, dnn",
author = "Longfei Li and Yong Zhao and Dongmei Jiang and Yanning Zhang and Fengna Wang and Isabel Gonzalez and Valentin Enescu and Hichem Sahli",
year = "2013",
language = "English",
isbn = "978-0-7695-5048-0",
series = "International Conference on Affective Computing and Intelligent Interaction and Workshops",
publisher = "IEEE",
pages = "312--317",
booktitle = "2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII 2013)",
note = "2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII 2013) ; Conference date: 02-09-2013 Through 05-09-2013",
}