Publication Details
Iman Marivani



Big datasets contain correlated heterogeneous data acquired by diverse modalities, e.g., photography, multispectral and infrared imaging, as well as computed tomography (CT), X- radiography, and ultra-sound sensors in medical imaging and non-destructive testing. While there are modalities that can easily be captured in high-resolution, in practice some modalities are more susceptible to environmental noise and are mainly available in low-resolution due to the time constraints as well as the cost per pixel of the corresponding sensors. Hence, multimodal image restoration, which refers to the reconstruction of one modality guided by another, and multimodal image fusion, that is, the fusion of images from different sources into a single more comprehensive one, are among important computer vision problems. In this PhD research, we focus on designing deep unfolding networks for multimodal image restoration and fusion. Analytical methods for image restoration and fusion rely on solving complex optimization problems at training and inference, making them computationally expensive. Deep learning methods can learn a nonlinear mapping between the input and the desired output from data, delivering high accuracy at a lowcomputational cost during inference. However, the existing deep models, which behave like a black box, do not incorporate any prior knowledge. Recently, deep unfolding introduced the idea of integrating domain knowledge in the form of signal priors, e.g., sparsity, into the single modal neural network architecture. In this thesis, we present multimodal deep unfolding designs based on coupled convolutional sparse coding for multimodal image restoration and fusion. We propose two formulations for multimodal image restoration in the form of coupled convolutional sparse coding problems. The first formulation assumes that the representations of the guidance modality is provided and fixed. While the second formulation allows intermediate refinements of both modalities to produce a more suitable guidance representation for the reconstruction. We design two categories of multimodal CNNs by adopting two optimization techniques, i.e., proximal algorithms, and the method of multipliers, for solving the corresponding sparse coding problems. We also design a multimodal image fusion model based on the second formulation. Our deep unfolding models are extensively evaluated on several benchmark multimodal image datasets for the applications of multimodal image super- resolution and denoising, as well as multi focus and multi exposure image fusion.