Publication Details



In the past decade, the interest in intelligent applications, ranging from smart homes and healthcare to social networks and autonomous driving, has drastically increased.This has led to significant progress in machine learning research. Notably, deep learning, a subfield of machine learning, has gained popularity due to its superior performance in numerous computer vision or natural language processing tasks. For instance, deep learning based models trained on large-scale datasets can generate realistic images, achieve near human-level performance in image classification, or determine infrequent and interesting collision events in the dataset collected using a Large Hadron Collider at CERN. Despite this advancement, there are still challenges that need to be addressed in order to fully harness the potential of deep learning methods. This thesis focuses on three such challenges: democratizing distributed learning, tackling task-specific problems during the model optimization process and designing deep learning architectures for mobile devices.The challenge of performing distributed learning is the cost of transferring a huge amount of information at each iteration of the training. This problem becomes worse when distributed learning is performed through a wireless network, due to limited bandwidth. The next challenge concerns the process of model optimization. There can be specific problems in every task that need to be handled. For example, if in a classification dataset, the number of images of one class is significantly higher than that of another class, the model would generalize poorly due to the class-imbalance problem. Thirdly, as mobile devices are integral part of our lives, the design of high-performance deep learning models for such devices is crucial. This includes a design of architectural modules that can efficiently utilize the learnable parameters in order to provide the highest possible increase in performance.Taking a step in addressing these challenges, our first contribution in this thesis is a novel framework for distributed training. Our solutions employ a lightweight neutral network to compress the information that is being sent at every iteration to reduce the communication bandwidth. The proposed framework is able to reduce the amount of the transmitted information up to 877×, compared with the amount of information transferred during conventional distributed learning. Our second contribution is the introduction of two loss functions designed to tackle problems arising in image classification and single image super-resolution tasks. Finally, our third contribution is a development of a new family of compact models for on-device inference and the efficient architectural unit for the task of real-time semantic segmentation.