Publication Details

IEEE International Conference on Image Processing (ICIP)

Contribution To Book Anthology


Predicting the 6DoF pose of vehicles from a single view image without additional constraints remains an ill-posed problem. Current monocular approaches require expensive and time-consuming annotations of vehicle-specific feature points and/or the 2D-3D feature correspondences. In this paper, we propose a novel monocular approach for vehicle pose estimation in SE(3), dubbed Mono6D, that uses vehicle 3D priors provided by vehicle make-and-model recognition methods to estimate the 6D pose. The proposed method mainly consists of: 1) a two-separate-branch module to learn multi-modal representations 2) a fusion schema to learn pose-specific representative embeddings. The experimental results show that the proposed method is superior to the state-of-the-art approaches in both objective and subjective terms.

DOI scopus