ETROVUB

Date

18 / 06 / 2024

On June 20th 2024 at 16:00, Yangxintong Lyu will defend their PhD entitled “DEEP-LEARNING-BASED MULTI-MODAL FUSION FOR TRAFFIC IMAGE DATA PROCESSING”.

Everybody is invited to attend the presentation in room I.0.02, or digitally via this link.

Abstract ■

In recent years, deep-learning-based technologies have significantly developed, which is driven by a large amount of data associated with task-specific labels. Among the various formats used for representing object attributes in computer vision, RGB images stand out as a ubiquitous choice. Their value extends to traffic-related applications, particularly in the realms of autonomous driving and intelligent surveillance systems. By using an autonomous driving system, a car is capable of navigating and operating with diminished human interactions, while traffic conditions can be monitored and analysed by an intelligent system. Essentially, the techniques reduce human error and improve road safety, which significantly impacts our daily life.

Although many visual-based traffic analysis tasks can indeed be effectively solved by leveraging features extracted from a sole RGB channel, certain unresolved challenges persist that introduce extra difficulties under certain situations. First of all, extracting complicated information becomes demanding, especially under erratic lighting conditions, raising a need for auxiliary clues. Secondly, obtaining large-scale accurate labels for challenging tasks remains time-consuming, costly, and arduous. The former prompts exploration into capturing and exploiting additional information such that the objects can be observed from diverse aspects; in contrast, the latter requires either an increase in the volume of available data or the capability to learn from other datasets that already possess perfect labels.

In this thesis, we tackle multi-modal data fusion and data scarcity for intelligent transportation systems. Our first contribution is a novel RGB-Thermal fusion neural network for semantic segmentation. It ensures the segmentation under limited illumination. Our second contribution is a 3D-prior-based framework for monocular vehicle 6D pose estimation. The use of 3D geometry avoids the ill-posed pose prediction from a single camera viewpoint. Thanks to the extra 3D information, our novel method can handle distant and occluded vehicles. The third contribution is a real-world, large-scale vehicle make and model dataset that contains the most popular brands operating in Europe. Moreover, we propose a two-branch deep learning vehicle make and model recognition paradigm to reduce inter-make ambiguity. The last contribution is a weakly supervised vehicle 6D pose estimation paradigm by adapting knowledge built based on a novel synthetic dataset. The dataset includes a large amount of accurate labels for vehicles. By learning from the synthetic dataset, our method allows the significant reduction of expensive real-life vehicle pose annotations.

Comprehensive experimental results reveal that the newly introduced datasets hold significant promise for deep-learning-based processing of traffic image data. Moreover, the proposed methods surpass the existing baselines in the literature. Our research not only yields high-quality scientific publications but also underscores its value across both academic and industrial domains.