Alexandros Stergiou, Georgios Kapidis, Grigorios Kalliatakis, Christos Chrysoulas, Ronald Poppe, Remco Veltkamp
Deep convolutional networks are widely used in video action recognition. 3D convolutions are one prominent approach to deal with the additional time dimension. While 3D convolutions typically lead to higher accuracies, the inner workings of the trained models are more difficult to interpret. We focus on creating human-understandable visual explanations that represent the hierarchical parts of spatio-temporal networks. We introduce Class Feature Pyramids, a method that traverses the entire network structure and incrementally discovers kernels at different network depths that are informative for a specific class. Our method does not depend on the network's architecture or the type of 3D convolutions, supporting grouped and depth-wise convolutions, convolutions in fibers, and convolutions in branches. We demonstrate the method on six state-of-the-art 3D convolution neural networks (CNNs) on three action recognition (Kinetics-400, UCF-101, and HMDB-51) and two egocentric action recognition datasets (EPIC-Kitchens and EGTEA Gaze+).
Stergiou, A, Kapidis, G, Kalliatakis, G, Chrysoulas, C, Poppe, R & Veltkamp, R 2019, Class feature pyramids for video explanation. in Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019., 9022210, Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019, Institute of Electrical and Electronics Engineers Inc., pp. 4255-4264, 17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019, Seoul, Korea, Republic of, 27/10/19. https://doi.org/10.1109/ICCVW.2019.00524
Stergiou, A., Kapidis, G., Kalliatakis, G., Chrysoulas, C., Poppe, R., & Veltkamp, R. (2019). Class feature pyramids for video explanation. In Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019 (pp. 4255-4264). Article 9022210 (Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCVW.2019.00524
@inproceedings{70c8664a900244f5999126fbe3da41e5,
title = "Class feature pyramids for video explanation",
abstract = "Deep convolutional networks are widely used in video action recognition. 3D convolutions are one prominent approach to deal with the additional time dimension. While 3D convolutions typically lead to higher accuracies, the inner workings of the trained models are more difficult to interpret. We focus on creating human-understandable visual explanations that represent the hierarchical parts of spatio-temporal networks. We introduce Class Feature Pyramids, a method that traverses the entire network structure and incrementally discovers kernels at different network depths that are informative for a specific class. Our method does not depend on the network's architecture or the type of 3D convolutions, supporting grouped and depth-wise convolutions, convolutions in fibers, and convolutions in branches. We demonstrate the method on six state-of-the-art 3D convolution neural networks (CNNs) on three action recognition (Kinetics-400, UCF-101, and HMDB-51) and two egocentric action recognition datasets (EPIC-Kitchens and EGTEA Gaze+).",
keywords = "Saliency-visualization, Spatio-temporal-cnns, Visual-explanations",
author = "Alexandros Stergiou and Georgios Kapidis and Grigorios Kalliatakis and Christos Chrysoulas and Ronald Poppe and Remco Veltkamp",
note = "Funding Information: This work is supported by the Netherlands Organization for Scientific Research (NWO) with a TOP-C2 grant for Automatic recognition of bodily interactions (ARBITER) and the EU H2020 research and innovation program under the Marie Sk{\l}odowska Curie grant agreement No 676157 (ACROSSING). Publisher Copyright: {\textcopyright} 2019 IEEE. Copyright: Copyright 2020 Elsevier B.V., All rights reserved.; 17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019 ; Conference date: 27-10-2019 Through 28-10-2019",
year = "2019",
month = oct,
doi = "10.1109/ICCVW.2019.00524",
language = "English",
series = "Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "4255--4264",
booktitle = "Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019",
address = "United States",
}