On April 21 2023 at 14.00, Quentin Bolsée will defend his PhD entitled “CALIBRATION AND PREPROCESSING OF LIGHT FIELD AND MULTIVIEW DEPTH SYSTEMS”.
Everybody is invited to attend the presentation in room I.2.01 or via this link.
Recently, there has been an increasing demand for high quality 3D content, yet there remains a gap with what real-time depth sensors are capable of. Active sensors such as Time-of-Flight cameras still produce excessively noisy data, while passive technology (photogrammetry, Light Fields) coupled with depth estimation is nowhere near real-time and still presents missing information for challenging scenes. Deep learning has shown promising results in both areas, although the actual properties of physical sensors are almost always neglected.
In this work, properties of multiview depth camera setups are thoroughly examined towards producing a high quality geometry acquisition system. First, a novel calibration step is proposed for a global optimization of the multiple camera parameters using a custom 3D object covered with charuco markers. The noise models are then discussed, and a residual learning convolutional neural network is shown to greatly reduce it. When merging the results from several cameras, a novel refinement step is applied with a pointnet-like neural network constrained to shift 3D points along their viewing ray. This provides a correction on the depth map that preserves the pixel structure while harnessing properties of natural 3D surfaces and observations from other cameras. Combined with the preprocessing by the convolutional neural network and flying pixel removal, this approach is shown to outperform state-of-the-art noise removal methods in both depth map and 3D domain.
In the second part of the thesis, properties of light field systems are discussed, and a new geometrical model is proposed when calibrating microlens arrays in modern Light Field cameras. Unlike previous works, lens distortion parameters are added to the description of the microlens, leading to a non-constant baseline in the virtual camera array. The calibrated model is shown to outperform the state of the art when applied to stereo matching depth estimation. The topic of depth estimation is further studied by showcasing a new 3D convolution-based neural network successfully applied on synthetic light field datasets. The main advantage is a significant reduction in the number of training parameters by treating the camera index as a third dimension, exploiting its isotropy. Finally, a motorized 2-DOF device for spherical light field acquisition is presented and calibrated with a 3D object similar to the one previously described for multiview depth systems. Global optimization of the sphere and camera parameters leads to a sub-pixel accuracy and high-quality depth estimation. Those results are confirmed by comparing a captured image with its reconstruction from neighboring virtual cameras using depth-based view synthesis.