In recent years, easy access to massive sets of labelled data, increased computing power provided from GPUs and pretrained models built by experts made Deep Learning dominate in many computer vision and pattern recognition tasks. Before the breakthrough of AlexNet in 2012, computers were trained using characteristics (handcrafted features) extracted by human researchers. Deep learning allowed machines to learn the features that optimally represent the data for the specific problem. In medical applications, the transition from systems that use handcrafted features to systems that learn features from the data has been gradual. The number of deep learning applications to medical image analysis grew rapidly in 2015 and 2016 and now deep learning is dominant at major conferences and competitions.
Thus, in applications for natural images very few people train an entire Convolutional Network (CNN) from scratch, by random weight initialization, but instead it is common to pre-train a CNN on a very large dataset, like ImageNet which contains 1.2 million images with 1000 categories. Then, this CNN is used as fixed feature extractor. It is also possible to fine-tune one or more of its layers on the new dataset.
However, in medical applications pre-training and fine-tuning is more challenging as there are various modalities and often 3D models are required which are not compatible with the traditional 2D pre-trained networks. Furthermore, in these applications the available data is limited, for example less than 1,000 images. Therefore, one of the main challenges in applying deep learning is to build a deep learning model trained by limited number of training samples without suffering from overfitting. One of the various strategies to solve this issue is to use the same pre-trained models derived by natural images but this is not always possible. Furthermore, transfer learning only succeeds when the training samples look like natural images and similar features can be extracted and represent the original data.
Data augmentation techniques like translation and rotation can be used to further increase the size of the training data and avoid overfitting. Using ReLUs as activation function, batch normalization, dropout and momentum has also been proven that helps deep models to better converge without being overfitted. Furthermore, unsupervised training techniques like auto-encoders or Restricted Boltzmann machines (RBMs) can be employed to generate generic deep learning models. These algorithms process data without labels and are trained to find patterns, such as latent subspaces.
The main aim of this PhD is to investigate techniques that allow to reduce the need for large volumes of labelled data while maintaining performance of the deep learning models. Data augmentation techniques can be used to artificially increase the size of the training data and avoid overfitting. We investigate applications in medical image analysis, where labelled data is scarce or costly to come by. More specifically, we are focusing on detecting lung nodules as they indicate early stages of lung cancer.