Recognizing violence in crowded scenes is a major challenge for automatic video surveillance. Indeed, there is a growing need of intelligent surveillance systems to strengthen public safety. In this paper we propose an effective approach to recognize violence in crowded videos based on a shallow Convolutional Neural Network (CNN) that is pretrained using an unsupervised layer-wise learning strategy. Afterwards, the pretrained hyper-parameters are fine-tuned to extract intermediate frame representations, which are subsequently aggregated via NetVLAD to obtain video representations to recognize violence in footage. Through experimental evaluation we validated that our proposal yields very competitive outcomes compared to results reported in the state-of-the-art.
Diaz Berenguer, A, Oveneke, MC, Alioscha-Perez, M & Sahli, H 2019, Paired supervised learning and unsupervised pretraining of CNN-architecture for violence detection in videos. in Proceedings of the 31st Benelux Conference on Artificial Intelligence (BNAIC2019) and the 28th Belgian Dutch Conference on Machine Learning (Benelearn2019). vol. 2491, CEUR Workshop Proceedings, CEUR Workshop Proceedings, BNAIC 2019, Brussels, Belgium, 7/11/19. <http://ceur-ws.org/Vol-2491/abstract81.pdf>
Diaz Berenguer, A., Oveneke, M. C., Alioscha-Perez, M., & Sahli, H. (2019). Paired supervised learning and unsupervised pretraining of CNN-architecture for violence detection in videos. In Proceedings of the 31st Benelux Conference on Artificial Intelligence (BNAIC2019) and the 28th Belgian Dutch Conference on Machine Learning (Benelearn2019) (Vol. 2491). (CEUR Workshop Proceedings). CEUR Workshop Proceedings. http://ceur-ws.org/Vol-2491/abstract81.pdf
@inproceedings{4b551c12c7564818bd6b7079db45c4c8,
title = "Paired supervised learning and unsupervised pretraining of CNN-architecture for violence detection in videos",
abstract = "Recognizing violence in crowded scenes is a major challenge for automatic video surveillance. Indeed, there is a growing need of intelligent surveillance systems to strengthen public safety. In this paper we propose an effective approach to recognize violence in crowded videos based on a shallow Convolutional Neural Network (CNN) that is pretrained using an unsupervised layer-wise learning strategy. Afterwards, the pretrained hyper-parameters are fine-tuned to extract intermediate frame representations, which are subsequently aggregated via NetVLAD to obtain video representations to recognize violence in footage. Through experimental evaluation we validated that our proposal yields very competitive outcomes compared to results reported in the state-of-the-art.",
author = "{Diaz Berenguer}, Abel and Oveneke, {Meshia C{\'e}dric} and Mitchel Alioscha-Perez and Hichem Sahli",
year = "2019",
month = nov,
day = "7",
language = "English",
volume = "2491",
series = "CEUR Workshop Proceedings",
publisher = "CEUR Workshop Proceedings",
booktitle = "Proceedings of the 31st Benelux Conference on Artificial Intelligence (BNAIC2019) and the 28th Belgian Dutch Conference on Machine Learning (Benelearn2019)",
note = "BNAIC 2019 ; Conference date: 07-11-2019 Through 08-11-2019",
}