This paper presents a study of explainable AI methods applied to video anomaly detection. Specifically, we put forward a multidimensional evaluation protocol to evaluate attribution methods by considering the correctness of the explanations, their plausibility with respect to ground-truth anomaly data, and the robustness of explanations across multiple time frames. We evaluate these metrics on common gradient-based and perturbation-based explanation techniques, which we use to explain a 3DCNN-based classifier trained on real video data. Our results show that using specific methods generally leads to trade-offs in explanation performance, which include the higher computational cost related to video data. In particular, gradient-based methods achieve higher robustness across multiple frames, whereas perturbation methods achieve higher model fidelity scores.
Zhang, X, Joukovsky, B & Deligiannis, N 2023, Quantitative Evaluation of Video Explainability Methods via Anomaly Localization. in 2023 31st European Signal Processing Conference (EUSIPCO). European Signal Processing Conference, IEEE, pp. 1235-1239, 31st European Signal Processing Conference, Helsinki, Finland, 4/09/23. https://doi.org/10.23919/EUSIPCO58844.2023.10290127
Zhang, X., Joukovsky, B., & Deligiannis, N. (2023). Quantitative Evaluation of Video Explainability Methods via Anomaly Localization. In 2023 31st European Signal Processing Conference (EUSIPCO) (pp. 1235-1239). (European Signal Processing Conference). IEEE. https://doi.org/10.23919/EUSIPCO58844.2023.10290127
@inproceedings{ee5f6288e73f46c293e06e79d6c39292,
title = "Quantitative Evaluation of Video Explainability Methods via Anomaly Localization",
abstract = "This paper presents a study of explainable AI methods applied to video anomaly detection. Specifically, we put forward a multidimensional evaluation protocol to evaluate attribution methods by considering the correctness of the explanations, their plausibility with respect to ground-truth anomaly data, and the robustness of explanations across multiple time frames. We evaluate these metrics on common gradient-based and perturbation-based explanation techniques, which we use to explain a 3DCNN-based classifier trained on real video data. Our results show that using specific methods generally leads to trade-offs in explanation performance, which include the higher computational cost related to video data. In particular, gradient-based methods achieve higher robustness across multiple frames, whereas perturbation methods achieve higher model fidelity scores.",
author = "Xinyue Zhang and Boris Joukovsky and Nikos Deligiannis",
note = "Funding Information: ACKNOWLEDGEMENT This research received funding from the FWO (Grant 1SB5721N), Belgium. Publisher Copyright: {\textcopyright} 2023 European Signal Processing Conference, EUSIPCO. All rights reserved.; 31st European Signal Processing Conference, EUSIPCO 2023 ; Conference date: 04-09-2023 Through 08-09-2023",
year = "2023",
month = oct,
day = "29",
doi = "10.23919/EUSIPCO58844.2023.10290127",
language = "English",
series = "European Signal Processing Conference",
publisher = "IEEE",
pages = "1235--1239",
booktitle = "2023 31st European Signal Processing Conference (EUSIPCO)",
url = "https://eusipco2023.org/",
}