-We propose an audiovisual source separation algorithm for speech signals. In our proposed algorithm we first extract the time segments with low activity of the mouth region from synchronous video recordings. An automatically selected optimal classifier is used to detect silent intervals in these instants of low visual mouth activity. Then, the source separation problem is formulated and solved for the entire signal duration. Our approach was tested on two challenging speech corpora with two speakers and two microphones, namely in the first corpus separate source signals were mixed in a simulated room, and the second corpus contains recorded conversations. The results are promising on both corpora: with the visual silence detector the performance of the source separation algorithm, measured by the signal to noise inference ratio increases.
Gonzalez, I, Ravyse, I, Verhelst, W, Brouckxon, H, Jiang, D & Sahli, H 2009, A Visual Silence Detector Constraining Speech Source Separation. in Proceedings of the 5th International Conference on Image and Graphics, Xi’ an, China. IEEE Computer Society Press, pp. 463-470, Finds and Results from the Swedish Cyprus Expedition: A Gender Perspective at the Medelhavsmuseet, Stockholm, Sweden, 21/09/09.
Gonzalez, I., Ravyse, I., Verhelst, W., Brouckxon, H., Jiang, D., & Sahli, H. (2009). A Visual Silence Detector Constraining Speech Source Separation. In Proceedings of the 5th International Conference on Image and Graphics, Xi’ an, China (pp. 463-470). IEEE Computer Society Press.
@inproceedings{3cb04d4cf4fd41e9a3298a6f4b0a1cbd,
title = "A Visual Silence Detector Constraining Speech Source Separation",
abstract = "-We propose an audiovisual source separation algorithm for speech signals. In our proposed algorithm we first extract the time segments with low activity of the mouth region from synchronous video recordings. An automatically selected optimal classifier is used to detect silent intervals in these instants of low visual mouth activity. Then, the source separation problem is formulated and solved for the entire signal duration. Our approach was tested on two challenging speech corpora with two speakers and two microphones, namely in the first corpus separate source signals were mixed in a simulated room, and the second corpus contains recorded conversations. The results are promising on both corpora: with the visual silence detector the performance of the source separation algorithm, measured by the signal to noise inference ratio increases.",
author = "Isabel Gonzalez and Ilse Ravyse and Werner Verhelst and Henk Brouckxon and Dongmei Jiang and Hichem Sahli",
year = "2009",
month = sep,
day = "24",
language = "English",
isbn = "978-0-7695-3883-9",
pages = "463--470",
booktitle = "Proceedings of the 5th International Conference on Image and Graphics, Xi{\textquoteright} an, China",
publisher = "IEEE Computer Society Press",
note = "Finds and Results from the Swedish Cyprus Expedition: A Gender Perspective at the Medelhavsmuseet ; Conference date: 21-09-2009 Through 25-09-2009",
}