The rapid advancement in deep learning models, particularly in Natural LanguageProcessing (NLP) and Computer Vision (CV), has significantly impacted variousdomains, such as content creation in media, medical diagnostics, and autonomoussystems. For instance, the Vision Transformer (ViT) models, especially the ViT-Huge variant, which contains 632 million parameters, demonstrate outstandingperformance on image classification benchmarks. Their superior capability is crucialfor the development of visual perception systems in next-generation autonomoussystems.Yet, such deep learning models demand considerable computational resourcesand extensive training datasets. Decentralized training frameworks emerge as aviable strategy to mitigate these challenges, which distributes the computationalload across multiple edge nodes. This framework not only accelerates the trainingprocess but also maintains data privacy, as the training data resides locally at theedge nodes and is not transmitted to a central server. Consequently, it encouragesthird parties to contribute more sensitive data by ensuring data privacy. Despitethe benefits of decentralized training, its wider adoption is hampered by issuessuch as communication overhead and potential privacy breaches. The exchange ofsubstantial gradient volumes between the central server and edge nodes necessitateshigh communication rates, which impose bandwidth limitations and latency issues,particularly in scenarios lacking robust internet infrastructure. Moreover, privacyconcerns have been amplified by studies demonstrating that malicious serverscould deduce sensitive client attributes (e.g., gender, age) from shared gradients. Amore alarming threat, Gradient Inversion Attacks (GIAs), can reconstruct clients{\textquoteright}training data from gradients, thereby extracting maximum information and posingserious privacy risks. Those privacy risks impede the willingness of third parties tocontribute the training data.Taking a step in addressing these challenges, this thesis explores three pivotaldomains in decentralized training: (1) efficient communication, (2) privacy leakageassessment, and (3) privacy-preserving gradient-sharing techniques. Our first majorcontribution is a distributed Adam optimization approach paired with an aggressivegradient sparsification compression strategy tailored for transformer-based models.This approach drastically reduces gradient transmission to just 0.1% of its originalsize without compromising model efficacy. The second contribution presents anovel GIA that effectively compromises some well-established privacy-preservinggradient-sharing techniques relying on stochasticity (perturbation) during the edgetraining, thereby exposing their security vulnerabilities. Lastly, we introduce alearned lossy compression approach aid to prevent information leakage, markingour third contribution.
Chen, Y 2024, 'Communication-efficient and privacy-preserving decentralized training of deep learning models', Vrije Universiteit Brussel.
Chen, Y. (2024). Communication-efficient and privacy-preserving decentralized training of deep learning models. [PhD Thesis, Vrije Universiteit Brussel].
@phdthesis{1acd0b5d612b4ab1b3225a335acf250c,
title = "Communication-efficient and privacy-preserving decentralized training of deep learning models",
abstract = "The rapid advancement in deep learning models, particularly in Natural LanguageProcessing (NLP) and Computer Vision (CV), has significantly impacted variousdomains, such as content creation in media, medical diagnostics, and autonomoussystems. For instance, the Vision Transformer (ViT) models, especially the ViT-Huge variant, which contains 632 million parameters, demonstrate outstandingperformance on image classification benchmarks. Their superior capability is crucialfor the development of visual perception systems in next-generation autonomoussystems.Yet, such deep learning models demand considerable computational resourcesand extensive training datasets. Decentralized training frameworks emerge as aviable strategy to mitigate these challenges, which distributes the computationalload across multiple edge nodes. This framework not only accelerates the trainingprocess but also maintains data privacy, as the training data resides locally at theedge nodes and is not transmitted to a central server. Consequently, it encouragesthird parties to contribute more sensitive data by ensuring data privacy. Despitethe benefits of decentralized training, its wider adoption is hampered by issuessuch as communication overhead and potential privacy breaches. The exchange ofsubstantial gradient volumes between the central server and edge nodes necessitateshigh communication rates, which impose bandwidth limitations and latency issues,particularly in scenarios lacking robust internet infrastructure. Moreover, privacyconcerns have been amplified by studies demonstrating that malicious serverscould deduce sensitive client attributes (e.g., gender, age) from shared gradients. Amore alarming threat, Gradient Inversion Attacks (GIAs), can reconstruct clients{\textquoteright}training data from gradients, thereby extracting maximum information and posingserious privacy risks. Those privacy risks impede the willingness of third parties tocontribute the training data.Taking a step in addressing these challenges, this thesis explores three pivotaldomains in decentralized training: (1) efficient communication, (2) privacy leakageassessment, and (3) privacy-preserving gradient-sharing techniques. Our first majorcontribution is a distributed Adam optimization approach paired with an aggressivegradient sparsification compression strategy tailored for transformer-based models.This approach drastically reduces gradient transmission to just 0.1% of its originalsize without compromising model efficacy. The second contribution presents anovel GIA that effectively compromises some well-established privacy-preservinggradient-sharing techniques relying on stochasticity (perturbation) during the edgetraining, thereby exposing their security vulnerabilities. Lastly, we introduce alearned lossy compression approach aid to prevent information leakage, markingour third contribution.",
author = "Yiming Chen",
year = "2024",
language = "English",
school = "Vrije Universiteit Brussel",
}