Publication Details

IEEE Transactions on Instrumentation and Measurement

Contribution To Journal


Synthetic traffic datasets provide highly accurate and affordable annotations, which are of crucial importance in complex vision-based perception tasks performed on real-world traffic data. Due to the lack of paired 2-D–3-D data, it remains very challenging when adapting the knowledge of a vehicle’s pose in SE(3) with its known 3-D geometry. In this article, we first propose a synthetic dataset, SynthV6D, enabling 6-D pose estimation of vehicles in monocular traffic images. The dataset comprises industrial-grade vehicles in motion evolving in realistic virtual scenery, covering a wide range of viewpoints and distances. Second, we introduce a weakly supervised domain adaptation approach, dubbed W6DNet, to recover the 6-D pose. To this end, by using the synthetic dataset, a novel linked image feature space-based domain adaptation is introduced. Furthermore, an original two-step double-fusion block is proposed to fuse the multi-modal data representations and the cross-space features. Consequently, the proposed method learns the pose-specific embeddings. We evaluate W6DNet on the real-world ApolloCar3D dataset. Extensive experimental results demonstrate that, when a small amount of real-world data is accessible, the proposed approach can significantly advance the performance when adapting knowledge from SynthV6D. Moreover, it achieves competitive performance compared to fully supervised state-of-the-art methods. The code is available at .

DOI scopus