ETROVUB

DATA ATTRIBUTION FOR GENERATIVE MODELS VIA UNLEARNING ■

Subject ■

Recent text-to-image models achieve impressive visual quality, but their training process
remains largely opaque. A key open question is which training samples most strongly
influence a generated image. Data attribution addresses this by identifying influential
training examples, while machine unlearning offers a practical mechanism to simulate the
removal of data from a trained model. Recent work shows that unlearning a synthesized
image can reveal which training images are most responsible for its generation. This topic
combines explainable AI and generative modeling, with applications in transparency,
copyright, dataset auditing, and responsible AI use.

Kind of work ■

The thesis will study data attribution methods for text-to-image generative models, with a
particular focus on unlearning-based attribution. The central research question is how
machine unlearning can be used to identify influential training samples in a way that is
accurate, computationally feasible, and interpretable. A first objective is to review the
literature on data attribution, influence functions, diffusion models, and machine
unlearning, and to position unlearning-based attribution relative to feature-matching and
gradient-based approaches.

A second objective is to reproduce and analyze a recent unlearning-based attribution
framework for text-to-image models. This includes understanding the role of Fisher-based
regularization, the choice of trainable parameters, and the effect of catastrophic forgetting
during unlearning. The goal will be to investigate how attribution quality is measured, for
example through reconstruction loss changes, retrieval metrics, or counterfactual leave-kout
evaluations.
A third objective is to explore one or more research extensions. Possible directions include
improving the efficiency of attribution, studying the effect of alternative regularization
schemes, extending attribution from whole images to local regions or semantic attributes,
or analyzing interactions between groups of training samples rather than ranking images
independently. Another possible direction is to compare unlearning-based attribution with
influence-function-based baselines under a common evaluation setup.
The expected outcome is a clear experimental and methodological study of unlearning for
data attribution, together with an assessment of its strengths, limitations, and possible
improvements. The thesis should result in a reproducible implementation, a critical
comparison of methods, and concrete recommendations for future work on interpretable
and trustworthy generative AI.

Framework of the Thesis ■

References and further reading
[1] Main paper of the topic: Wang, S.-Y., Hertzmann, A., Efros, A. A., Zhu, J.-Y., & Zhang, R.
(2025). Data attribution for text-to-image models by unlearning synthesized images
(arXiv:2406.09408v3) - pdf
[2] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. Highresolution
image synthesis with latent diffusion models. In IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), 2022. – pdf
[3] Sheng-Yu Wang, Alexei A. Efros, Jun-Yan Zhu, and Richard Zhang. Evaluating data attribution
for text-to-image models. In IEEE International Conference on Computer Vision (ICCV), 2023. - pdf
[4] Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. In
International conference on machine learning, pages 1885–1894. PMLR, 2017. - pdf
[5] Chuan Guo, Tom Goldstein, Awni Hannun, et al. Certified data removal from machine learning
models. In International Conference on Machine Learning (ICML), 2020. - pdf
[6] James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei
A Rusu, Kieran Milan, John Quan, et al. Overcoming catastrophic forgetting in neural networks.
Proceedings of the National Academy of Sciences (PNAS), 2017 - pdf