ETROVUB

Date

02 / 10 / 2025

On October 8th 2025 at 16:00, Xinxin Dai will defend their PhD entitled “LEARNING BASED RECONSTRUCTION AND MEASUREMENT OF 3D HANDS USING A SINGLE DEPTH CAMERA”.

Everybody is invited to attend the presentation in room D.2.01 or online via this link.

Abstract ■

Accurate 3D reconstruction and measurement extraction of the human hand are critical for a wide range of hand-centric applications, such as the design of immobilization devices, prosthetic limb fabrication, and osteoarthritis evaluation. However, the recovery of high-fidelity hand geometry remains challenging due to the inherently incomplete and occluded nature of point clouds acquired from commodity depth sensors, which are limited by viewpoint constraints and self-occlusion. Furthermore, traditional manual measurement methods, which require static hand postures and the expertise of trained anthropometrists, are inadequate for capturing measurements under realistic, task-specific hand motions, limiting their applicability in dynamic or non-standard scenarios.

To address these limitations, this thesis introduces deep learning-based methodologies aimed at addressing key challenges in the reconstruction and measurement extraction of 3D hand shapes. Specifically, the main research challenges include: (i) What is the optimal hand posture for precise and reliable measurement? (ii) How to fast and precisely reconstruct a complete hand shape from multi-view partial point clouds under different postures? and (iii) How can we simultaneously complete partial point clouds and reconstruct their surfaces while preserving the raw data? (iv) How to achieve human identification by the shape and posture of hands? The first challenge derives from the complexity of the human hand, which consists of 34 muscles and 27 bones. This intricate structure enables a wide range of postural variations, often resulting in significant geometric deformations that introduce considerable biases in measurement accuracy. Second, depth cameras inherently capture only partial point clouds due to limited viewpoints and self-occlusions, resulting in incomplete representations that restrict the accurate reconstruction of full hand geometry. Third, the lack of high-resolution surface details in a single partial point cloud makes it challenging to simultaneously achieve both point cloud completion and high-fidelity surface reconstruction. Lastly, while previous studies on human identification have primarily focused on recording the velocities of pressing and releasing different keys, these approaches lack integration with vision-based hand motion analysis.

To overcome the aforementioned challenges, this thesis introduces four deep learning-based models. The first model is Measure4DHand, designed for automatic extraction of dynamic hand measurements from partial hand point cloud sequences. By analyzing the variation in measurement values induced by skin deformation across different hand postures, this model facilitates the identification of optimal hand postures for accurate and consistent measurements. The second model is PatientHandNet, which focuses on reconstructing a high-fidelity 3D hand shape in a canonical open-palm pose using four depth images captured from different viewpoints by a single commodity depth sensor To facilitate the proposed model, a large-scale multi-view synthetic dataset with a wide variety of hand shapes and hand poses and corresponding ground truth hand shapes in a canonical open palm pose is proposed and a novel real-world dataset by capturing 18 subjects (13 males and 5 females) via a structure sensor Mark I employed in an iPad and hired a professional anthropometrist to obtain corresponding ground-truth hand biometric measurements. The third contribution proposed TailoredTemplateFit, which is, to the best of our knowledge, the first deep learning-based method in the literature is proposed to simultaneously address point cloud completion and surface reconstruction while preserving the raw data of the input. This model is trained and validated on two large-scale datasets: a large-scale 50K head dataset and 300K hand dataset with a wide variety of shapes and poses and corresponding ground truth shapes. Lastly, we present KD-Net, which explores a novel visual modality of keystroke dynamics for human identification from RGB-D image sequences. To support this research, a novel dataset dubbed KD-MultiModal is created, comprising 243.2 K frames of RGB images and depth images.

Our proposed methods consistently outperform the reference methods from the literature, as demonstrated through comprehensive experimentation. The research works have been published in various reputable journals and conferences, highlighting their impact in both academic and industrial contexts.