Multi-modal neural networks with multi-scale RGB-T fusion for semantic segmentation

Multi-modal neural networks with multi-scale RGB-T fusion for semantic segmentation ■

Abstract ■

A novel deep-learning-based method for semantic segmentation of RGB and Thermal images is introduced. The proposed method employs a novel neural network design for multi-modal fusion based on multi-resolution patch processing. A novel decoder module is introduced to fuse the RGB and Thermal features extracted by separate encoder streams. Experimental results on synthetic and real-world data demonstrate the efficiency of the proposed method compared with state-of-the-art methods.