Communication-efficient and privacy-preserving decentralized training of deep learning models

Communication-efficient and privacy-preserving decentralized training of deep learning models ■

Abstract ■

The rapid advancement in deep learning models, particularly in Natural Language Processing (NLP) and Computer Vision (CV), has significantly impacted various domains, such as content creation in media, medical diagnostics, and autonomous systems. For instance, the Vision Transformer (ViT) models, especially the ViT- Huge variant, which contains 632 million parameters, demonstrate outstanding performance on image classification benchmarks. Their superior capability is crucial for the development of visual perception systems in next-generation autonomous systems. Yet, such deep learning models demand considerable computational resources and extensive training datasets. Decentralized training frameworks emerge as a viable strategy to mitigate these challenges, which distributes the computational load across multiple edge nodes. This framework not only accelerates the training process but also maintains data privacy, as the training data resides locally at the edge nodes and is not transmitted to a central server. Consequently, it encourages third parties to contribute more sensitive data by ensuring data privacy. Despite the benefits of decentralized training, its wider adoption is hampered by issues such as communication overhead and potential privacy breaches. The exchange of substantial gradient volumes between the central server and edge nodes necessitates high communication rates, which impose bandwidth limitations and latency issues, particularly in scenarios lacking robust internet infrastructure. Moreover, privacy concerns have been amplified by studies demonstrating that malicious servers could deduce sensitive client attributes (e.g., gender, age) from shared gradients. A more alarming threat, Gradient Inversion Attacks (GIAs), can reconstruct clients{\textquoteright} training data from gradients, thereby extracting maximum information and posing serious privacy risks. Those privacy risks impede the willingness of third parties to contribute the training data. Taking a step in addressing these challenges, this thesis explores three pivotal domains in decentralized training: (1) efficient communication, (2) privacy leakage assessment, and (3) privacy-preserving gradient-sharing techniques. Our first major contribution is a distributed Adam optimization approach paired with an aggressive gradient sparsification compression strategy tailored for transformer-based models. This approach drastically reduces gradient transmission to just 0.1\% of its original size without compromising model efficacy. The second contribution presents a novel GIA that effectively compromises some well-established privacy-preserving gradient-sharing techniques relying on stochasticity (perturbation) during the edge training, thereby exposing their security vulnerabilities. Lastly, we introduce a learned lossy compression approach aid to prevent information leakage, marking our third contribution.