ETROVUB

Multispectral Visual Place Recognition for Long-Range Drone Loop Closure ■

Subject ■

Visual Place Recognition (VPR) enables loop closure in SLAM systems and bounds long-term drift, yet state-of-the-art methods (NetVLAD, AnyLoc, CosPlace) operate exclusively on RGB images and degrade significantly under appearance changes caused by lighting variation (day vs. dusk), weather (clear vs. fog), or seasonal shifts. From an aerial platform, these appearance changes are even more severe due to altitude-dependent viewpoint variation. Thermal-infrared imagery offers a complementary signal: thermal signatures of buildings, roads, vegetation, and water bodies remain far more stable across illumination and weather changes than their RGB appearance. In this master thesis, the goal is to develop a multispectral VPR system that learns joint descriptors from RGB and thermal images, integrated as a loop-closure backend into a VTIO or R-VTIO pipeline to correct accumulated drift on long-duration GNSS-denied drone flights.

Kind of work ■

The student will collect a multispectral place-recognition dataset by flying repeated trajectories over the same area at different times of day, weather conditions, and altitudes, recording synchronised RGB and thermal imagery with GNSS RTK ground truth for place-identity labels. Three fusion architectures will be implemented and compared: early fusion (4-channel input to a modified NetVLAD/AnyLoc backbone), mid-level fusion (dual-encoder with cross-attention), and late fusion (separate RGB and thermal descriptors combined via learned weighting). The models will be trained on the collected dataset and evaluated on recall@1 and recall@5 against RGB-only AnyLoc and NetVLAD baselines under matched and cross-condition retrieval. The best model will be integrated as a loop-closure module in the VTIO or R-VTIO pipeline, end-to-end trajectory accuracy on long flights (10+ minutes) will be measured, and a research paper will be prepared.

Framework of the Thesis ■

The thesis will start with a literature review on Visual Place Recognition (NetVLAD, AnyLoc, CosPlace), metric learning and image retrieval, multispectral and thermal imaging, and loop-closure mechanisms in modern VIO and SLAM pipelines.
Next, the student will define the complete experimental framework: dataset-collection protocol (repeated flights over the same area under varying conditions), implementation of the three fusion architectures in PyTorch, training pipeline with appropriate metric-learning losses, and integration interface with the VTIO/R-VTIO loop-closure backend.
In the final phase, the student will conduct experimental validation: training and benchmarking the three fusion architectures, evaluating against RGB-only baselines under matched and cross-condition retrieval, integrating the best model as a loop-closure module, and measuring end-to-end trajectory-drift reduction on long flights. Real-time onboard performance will also be assessed. The validation phase concludes with a publication-ready research paper.

Expected Student Profile ■

The ideal candidate has a solid background in computer vision and deep learning with hands-on PyTorch experience, particularly with metric learning, image retrieval, and CNN/Transformer architectures. Familiarity with robotics and ROS2 is required, alongside knowledge of SLAM and loop-closure mechanisms. Experience with thermal or multispectral imaging is a plus. Strong Python programming skills are essential, and the candidate should be comfortable with both training neural networks and conducting field data collection.