Publication Details
Overview
 
 
 

Thesis

Abstract 

The field of 3D technology is attracting considerable academic and industrial interestdue to its expanding range of potential applications. Noteworthy domains include,but extend beyond, automotive, gaming, extended reality, drone inspection, robotics,medical imaging, and 3D modeling and design. An essential aspect for these appli-cations is 3D scene understanding. Point clouds, comprising a collection of points,play a crucial role in capturing the spatial information of physical environments.This results in a lightweight and precise 3D representation, preserving fine detailsand enabling efficient integration with real-world data.In recent years, deep learning has gained widespread use, demonstrating its sig-nificance across various domains. Its ability to automatically learn intricate patternsfrom vast datasets has resulted in a transformative impact, driving advancementsin technology, and reshaping the landscape of artificial intelligence applications.The ongoing development of increasingly sophisticated neural network architecturescontinues to push the boundaries of what is achievable across diverse sectors.As a result, deep learning has become ubiquitous. However, for point cloudprocessing, some important limitations for widespread applicability are still present.This thesis delves into four crucial challenges associated with deep learning-basedpoint cloud processing: (i) the precise labeling of extensive datasets, (ii) the modelcomplexity requirements, (iii) the latency introduced during inference, and (iv) theconcept of scalability. The initial challenge stems from the necessity for extensivedatasets with highly accurate annotations. Particularly in the 3D domain, obtainingsuch high-quality annotations proves challenging and, consequently, expensive. Thesecond challenge arises from the development of more intricate and memory-intensive methods, facilitated by advancements in high-power-consuming graphicscards. While these techniques achieve higher performance levels, they imposeconstraints on deployment, particularly for embedded devices. Furthermore, theescalating complexity of these networks is accompanied by an increased inferencetime, impeding real-time applications. Lastly, deep learning-based solutions lack theconcept of scalability which have proven vital in traditional methods.In this thesis, we tackle these challenges and propose diverse solutions withinthe deep learning paradigm. The thesis commences with the introduction of a rapid3D LiDAR simulator, designed to mitigate the labeling problem by learning fromperfectly annotated synthetic data. We demonstrate its applications in 3D denoisingand semantic segmentation. A second contribution can be found within the domainof point cloud instance segmentation. Through the joint learning of prototypes andcoefficients, we present an efficient and rapid method that requires relatively lowGPU memory. To further improve our method, we introduce an enhanced blockmerging algorithm. As a third main contribution, we achieve deep learning-basedquality scalability by learning embedded latent representations, demonstrating com-pelling results in applications such as image reconstruction, point cloud compression,and image semantic hashing. The final contribution introduces resolution-scalable3D semantic segmentation of point clouds. When applied to resolution-scalable 3Dsensors, it enables joint point cloud acquisition and processing.Our proposed methods consistently outperform established benchmarks acrossdiverse datasets, as demonstrated through comprehensive experimentation. The re-search findings have been disseminated in various reputable journals and conferences,and have led to a patent submission, highlighting their impact in both academic andindustrial contexts.

Reference