Publication Details
Overview
 
 
Shuo Yang, Anh Tuan Luu, Xuan-son Nguyen, Aymeric Histace, Bart Jansen, Hichem Sahli
 

Contribution to journal

Abstract 

2D-to-3D lifting is a fundamental approach in 3D human pose estimation (3DHPE). This task is crucial in applications, including motion analysis and virtual reality. While Graph Convolutional Networks (GCNs) have demonstrated effectiveness in capturing spatial relationships in human skeletons, they suffer from over-smoothing and limited receptive fields. Transformer-based models provide global context but struggle with local feature extraction and computational efficiency. To address these challenges, we propose ADGT, a novel parallel GCN-transformer architecture combining the strengths of both approaches. Our method introduces three key innovations: Hop-Wise Scalable Adaptive GCN to refine local feature extraction, Attention-Based Local Feature Extractor to enhance the integration of local and global representations, and Register-Based Transformer Enhancement to improve feature separation. Extensive experiments on Human3.6M and MPI-INF-3DHP datasets demonstrate ADGT achieves state-of-the-art performance among frame-based methods while maintaining computational efficiency. These results highlight the potential of ADGT for real-time applications requiring accurate and efficient 3DHPE.

Reference