Our research targets the direction of constructing task agnostic methods for optimizing the models and training flows, with a focus on explainability. As a tool for explainability, we are using concepts of Information Theory. In industry, developed algorithms can optimize time consumption, bandwidth, and energy consumptions both in the stage of training of neural networks and in the stage of production.
Most of the state-of-the-art results in the field of signal processing are being obtained with deep convolutional neural networks. There is also a strong dependency between the number of parameters in the deep models and their performance, meaning more parameters – better performance. For example, FixResNeXt-101 32x48d (Mahajan et al., 2018), a state-of-the-art model for image classification, contains approximately 800 million, and BERT (Devlin et al., 2019), a recent model for natural language processing, 110 million parameters. At the same time, an increase of the parameters is introducing some obstacles in the form of difficulties in training, fitting the model into hardware, or further usage on edge devices. We focused on the exploration of the deep neural networks through the lens of information theory. The exploration with the methods of Information Theory can be held on activations of the neural networks, on gradients, and even on the architecture of the model. Moreover, the data can be observed and optimized correctly, before feeding, with the goal to obtain the best possible generalization of the model.