Explainable artificial intelligence (XAI) has gained considerable attention in recent years as it aims to help humans better understand machine learning decisions, making complex black-box systems more trustworthy. Visual explanation algorithms have been designed to generate heatmaps highlighting image regions that a deep neural network focuses on to make decisions. While convolutional neural network (CNN) models typically follow similar processing operations for feature encoding, the emergence of vision transformer (ViT) has introduced a new approach to machine vision decision-making. Therefore, an important question is which architecture provides more human-understandable explanations. This paper examines the explainability of deep architectures, including CNN and ViT models under different vision tasks. To this end, we first performed a subjective experiment asking humans to highlight the key visual features in images that helped them to make decisions in two different vision tasks. Next, using the human-annotated images, ground-truth heatmaps were generated that were compared against heatmaps generated by explanation methods for the deep architectures. Moreover, perturbation tests were performed for objective evaluation of the deep models' explanation heatmaps. According to the results, the explanations generated from ViT are deemed more trustworthy than those produced by other CNNs, and as the features of the input image are more dispersed, the advantage of the model becomes more evident.
Yang, Y, Mahmoudpour, S, Schelkens, P & Deligiannis, N 2023, Evaluating Quality of Visual Explanations of Deep Learning Models for Vision Tasks. in International Conference on Quality of Multimedia Experience (QoMEX). 2023 15th International Conference on Quality of Multimedia Experience, QoMEX 2023, IEEE, pp. 159-164, 15th International Conference on Quality of Multimedia Experience (QoMEX)
, 20/06/23. https://doi.org/10.1109/QoMEX58391.2023.10178510
Yang, Y., Mahmoudpour, S., Schelkens, P., & Deligiannis, N. (2023). Evaluating Quality of Visual Explanations of Deep Learning Models for Vision Tasks. In International Conference on Quality of Multimedia Experience (QoMEX) (pp. 159-164). (2023 15th International Conference on Quality of Multimedia Experience, QoMEX 2023). IEEE. https://doi.org/10.1109/QoMEX58391.2023.10178510
@inproceedings{9b8a096762eb4410af28817bfe52f003,
title = "Evaluating Quality of Visual Explanations of Deep Learning Models for Vision Tasks",
abstract = "Explainable artificial intelligence (XAI) has gained considerable attention in recent years as it aims to help humans better understand machine learning decisions, making complex black-box systems more trustworthy. Visual explanation algorithms have been designed to generate heatmaps highlighting image regions that a deep neural network focuses on to make decisions. While convolutional neural network (CNN) models typically follow similar processing operations for feature encoding, the emergence of vision transformer (ViT) has introduced a new approach to machine vision decision-making. Therefore, an important question is which architecture provides more human-understandable explanations. This paper examines the explainability of deep architectures, including CNN and ViT models under different vision tasks. To this end, we first performed a subjective experiment asking humans to highlight the key visual features in images that helped them to make decisions in two different vision tasks. Next, using the human-annotated images, ground-truth heatmaps were generated that were compared against heatmaps generated by explanation methods for the deep architectures. Moreover, perturbation tests were performed for objective evaluation of the deep models' explanation heatmaps. According to the results, the explanations generated from ViT are deemed more trustworthy than those produced by other CNNs, and as the features of the input image are more dispersed, the advantage of the model becomes more evident.",
author = "Yuqing Yang and Saeed Mahmoudpour and Peter Schelkens and Nikos Deligiannis",
note = "Funding Information: This research received funding from the Flemish Government under the Onderzoeksprogramma Artifici ele Intelligentie (AI) Vlaanderen programme, from the FWO (Grant G0A4720N), and from imec through AAA project Trustworthy AI Methods (TAIM). Funding Information: This research received funding from the Flemish Government under the âOnderzoeksprogramma Artifici{\"e}le Intel-ligentie (AI) Vlaanderenâ programme, from the FWO (Grant G0A4720N), and from imec through AAA project Trustworthy AI Methods (TAIM). Publisher Copyright: {\textcopyright} 2023 IEEE.; 15th International Conference on Quality of Multimedia Experience (QoMEX)<br/> ; Conference date: 20-06-2023 Through 22-06-2023",
year = "2023",
month = apr,
day = "21",
doi = "10.1109/QoMEX58391.2023.10178510",
language = "English",
series = "2023 15th International Conference on Quality of Multimedia Experience, QoMEX 2023",
publisher = "IEEE",
pages = "159--164",
booktitle = "International Conference on Quality of Multimedia Experience (QoMEX)",
}