This paper presents a novel unsupervised method for identifying the semantic structure in long semi-structured video streams. We identify chains, i.e., local clusters of repeated features from both the video stream and audio transcripts. Each chain serves as an indicator that the temporal interval it demarcates is part of the same semantic event. By layering all the chains over each other, dense regions emerge from the overlapping chains, from which we can identify the semantic structure of the video. We present two clustering strategies that accomplish this task, and compare them against a baseline Scene Transition Graph approach. We then develop a commentator that provides a semantic labeling of the resultant video segmentation.
Poulisse, G-J, Patsis, G, Moens, M-F & Furht, B (ed.) 2014, 'Unsupervised scene detection and commentator building using multi-modal chains', Multimedia Tools and Applications, vol. 70, no. 1, pp. 159-175. <http://link.springer.com/article/10.1007%2Fs11042-012-1086-0>
Poulisse, G.-J., Patsis, G., Moens, M.-F., & Furht, B. (Ed.) (2014). Unsupervised scene detection and commentator building using multi-modal chains. Multimedia Tools and Applications, 70(1), 159-175. http://link.springer.com/article/10.1007%2Fs11042-012-1086-0
@article{ac06dda29eee4fea91d8c16863f830ed,
title = "Unsupervised scene detection and commentator building using multi-modal chains",
abstract = "This paper presents a novel unsupervised method for identifying the semantic structure in long semi-structured video streams. We identify chains, i.e., local clusters of repeated features from both the video stream and audio transcripts. Each chain serves as an indicator that the temporal interval it demarcates is part of the same semantic event. By layering all the chains over each other, dense regions emerge from the overlapping chains, from which we can identify the semantic structure of the video. We present two clustering strategies that accomplish this task, and compare them against a baseline Scene Transition Graph approach. We then develop a commentator that provides a semantic labeling of the resultant video segmentation.",
keywords = "Semantic event detection, Feature extraction, . Multi-modal scene segmentation, Video summarization",
author = "Gert-Jan Poulisse and Georgios Patsis and Marie-Francine Moens and Borko Furht",
note = "Borko Furht",
year = "2014",
month = may,
day = "1",
language = "English",
volume = "70",
pages = "159--175",
journal = "Multimedia Tools and Applications",
issn = "1573-7721",
publisher = "Springer Netherlands",
number = "1",
}