About ETRO  |  News  |  Events  |  Vacancies  |  Contact  
Home Research Education Industry Publications About ETRO

Master theses

Current and past ideas and concepts for Master Theses.

Explaining with prototypes and complemental examples


Explainibility is a key sought after component given the meteoric rise of deep learning models. While various prior approaches aim at mitigating the opaque na- ture of deep learning models, the effectiveness of such approaches in improving the credibility of such systems are still missing. Furthermore they have been ap- plied in unimodal data while the world surrounding us is increasingly multimodal in nature. In this thesis, we aim to solve visual understanding task with the aim of generating textual explanations which will provide justificiation based on the evidence.

The cognitive ability for machines has significantly improved due to significant progress made in deep learning models. An interesting intersection lies in using videos and the multiple modalities it comes along with like captions, text, audio etc. To use these multiple cues of information is long founded in the theory of how the human brain at a given moment is able to accomodate them. As DNNs are able to accomodate mul- tiple channels, they are still treated as black-box models, often failing to express or point towards the evidence which could improve the task in hand. As humans, a natu- ral approach towards explaining is based on providing examples. Examples provide a concrete understanding of abstract examples.

Kind of work

In this thesis, we will explore the generation of visual explanations with visual examples. Specifically while provided with a visual example, not only do we aim to classify the image, our aim is to provide textual justification for such decisions. We aim to look at the task of Zero-shot learning [MH16] where we will provide textual justification for the classification result.

Framework of the Thesis

We will build on existing work of Zero shot learning [MH16]. The primary stream of work is visualizing where the classifier weights for its prediction by assigning an importance score to each element in its input space. We will build on works in pro- totype selection [BT11] and machine teaching [ASC+18] where the idea is to extract and represent the most significant example which best represents the underlying data distribution. Buiding on this, we would augment our visual prototype with an explainer which will enable the model to generate linguistic explanations.

[ASC+18] Oisin Mac Aodha, Shihan Su, Yuxin Chen, Pietro Perona, and Yisong Yue. Teaching categories to human learners with visual explanations, 2018.
[BT11] Jacob Bien and Robert Tibshirani. Prototype selection for interpretable classification. The Annals of Applied Statistics, 5(4), Dec 2011.
[MH16] Tanmoy Mukherjee and Timothy M. Hospedales. Gaussian visual- linguistic embedding for zero-shot recognition. In Jian Su, Xavier Carreras, and Kevin Duh, editors, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016, pages 912–918. The Association for Computa- tional Linguistics, 2016.

Number of Students

1 (or 2)

Expected Student Profile

Programming experience (preferably Python). Familiarity with Computer Vision and Natural Language Processing.


Prof. Dr. Ir. Nikolaos Deligiannis

+32 (0)2 629 1683

more info


Dr. Tanmoy Mukherjee

+32 (0)2 629 2930

more info

- Contact person




- Contact person

- Thesis proposals

- ETRO Courses

- Contact person

- Spin-offs

- Know How

- Journals

- Conferences

- Books

- Vacancies

- News

- Events

- Press


ETRO Department

Tel: +32 2 629 29 30

©2022 • Vrije Universiteit Brussel • ETRO Dept. • Pleinlaan 2 • 1050 Brussels • Tel: +32 2 629 2930 (secretariat) • Fax: +32 2 629 2883 • WebmasterDisclaimer