On August 29 2022 at 14.00, Lusine Abrahamyan will defend her PhD entitled “Optimizing Deep Learning Methods for Computer Vision”.
Everybody is invited to attend the presentation live (in room D0.0.8) or online via this link.
In the past decade, the interest in intelligent applications, ranging from smart homes and healthcare to social networks and autonomous driving, has drastically increased. This has led to significant progress in machine learning research. Notably, deep learning, a subfield of machine learning, has gained popularity due to its superior performance in numerous computer vision or natural language processing tasks. For instance, deep learning based models trained on large-scale datasets can determine infrequent and interesting collision events in the dataset collected using a Large Hadron Collider at CERN. Despite this advancement, there are still challenges that need to be addressed to fully harness the potential of deep learning methods. This thesis focuses on three such challenges: democratizing distributed learning, tackling task-specific problems during the model optimization process and designing deep learning architectures for mobile devices. The challenge of performing distributed learning is the cost of transferring a huge amount of information at each iteration of the training. This problem becomes worse when distributed learning is performed through a wireless network due to limited bandwidth. The next challenge concerns the process of model optimization. There can be specific problems in every task that need to be handled. For example, if in a classification dataset, the number of images of one class is significantly higher than that of another class, the model would generalize poorly due to the class-imbalance problem. Thirdly, as mobile devices are an integral part of our lives, the design of high-performance deep learning models for such devices is crucial. This includes a design of architectural modules that can efficiently utilize the learnable parameters in order to provide the highest possible increase in performance. Taking a step in addressing these challenges, our first contribution in this thesis is a novel framework for distributed training. Our solutions employ a lightweight neural network to compress the information that is being sent at every iteration to reduce the communication bandwidth. The proposed framework can reduce the amount of transmitted information up to 877x. Our second contribution is the introduction of two loss functions designed to tackle problems arising in image classification and single image super-resolution tasks. Finally, our third contribution is the development of a new family of compact models for on-device inference and the efficient architectural unit for the task of real-time semantic segmentation.