We present a multi-stream Dynamic Bayesian Network model with Articulatory Features (AF_AV_DBN) for audio visual speech recognition. Conditional probability distributions of the nodes are defined considering the asynchronies between the articulatory features (AFs). Speech recognition experiments are carried out on an audio visual connected digit database. Results show that comparing with the state synchronous DBN model (SS_DBN) and state asynchronous DBN model (SA_DBN), when the asynchrony constraint between the AFs is appropriately set, the AF_AV_DBN model gets the highest recognition rates, with average recognition rate improved to 89.38% from 87.02% of SS_DBN and 88.32% of SA_DBN. Moreover, the audio visual multi-stream AF_AV_DBN model greatly improves the robustness of the audio only AF_A_DBN model, for example, under the noise of -10dB, the recognition rate is improved from 20.75% to 76.24%.
Jiang, D, Wu, P, Wang, F, Sahli, H & Verhelst, W 2010, Audio Visual Speech Recognition Based on Multi-Stream DBN Models with Articulatory Features. in Proceedings of the 7th International Symposium on Chinese Spoken Language Processing - ISCSLP2010. pp. 190-193, Finds and Results from the Swedish Cyprus Expedition: A Gender Perspective at the Medelhavsmuseet, Stockholm, Sweden, 21/09/09.
Jiang, D., Wu, P., Wang, F., Sahli, H., & Verhelst, W. (2010). Audio Visual Speech Recognition Based on Multi-Stream DBN Models with Articulatory Features. In Proceedings of the 7th International Symposium on Chinese Spoken Language Processing - ISCSLP2010 (pp. 190-193)
@inproceedings{355f8c3c0c8c4160b5e4d1ce6b281636,
title = "Audio Visual Speech Recognition Based on Multi-Stream DBN Models with Articulatory Features",
abstract = "We present a multi-stream Dynamic Bayesian Network model with Articulatory Features (AF_AV_DBN) for audio visual speech recognition. Conditional probability distributions of the nodes are defined considering the asynchronies between the articulatory features (AFs). Speech recognition experiments are carried out on an audio visual connected digit database. Results show that comparing with the state synchronous DBN model (SS_DBN) and state asynchronous DBN model (SA_DBN), when the asynchrony constraint between the AFs is appropriately set, the AF_AV_DBN model gets the highest recognition rates, with average recognition rate improved to 89.38% from 87.02% of SS_DBN and 88.32% of SA_DBN. Moreover, the audio visual multi-stream AF_AV_DBN model greatly improves the robustness of the audio only AF_A_DBN model, for example, under the noise of -10dB, the recognition rate is improved from 20.75% to 76.24%.",
keywords = "Audiovisual Speech Processing",
author = "Dongmei Jiang and Peng Wu and Fengna Wang and Hichem Sahli and Werner Verhelst",
year = "2010",
month = nov,
day = "29",
language = "English",
isbn = "978-1-4244-6244-5",
pages = "190--193",
booktitle = "Proceedings of the 7th International Symposium on Chinese Spoken Language Processing - ISCSLP2010",
note = "Finds and Results from the Swedish Cyprus Expedition: A Gender Perspective at the Medelhavsmuseet ; Conference date: 21-09-2009 Through 25-09-2009",
}