Predicting the 6DoF pose of vehicles from a single view image without additional constraints remains an ill-posed problem. Current monocular approaches require expensive and time-consuming annotations of vehicle-specific feature points and/or the 2D-3D feature correspondences. In this paper, we propose a novel monocular approach for vehicle pose estimation in SE(3), dubbed Mono6D, that uses vehicle 3D priors provided by vehicle make-and-model recognition methods to estimate the 6D pose. The proposed method mainly consists of: 1) a two-separate-branch module to learn multi-modal representations; 2) a fusion schema to learn pose-specific representative embeddings. The experimental results show that the proposed method is superior to the state-of-the-art approaches in both objective and subjective terms.
Yangxintong, L, Royen, RD & Munteanu, A 2022, MONO6D: MONOCULAR VEHICLE 6D POSE ESTIMATION WITH 3D PRIORS. in IEEE International Conference on Image Processing (ICIP). Proceedings - International Conference on Image Processing, ICIP, IEEE, Bordeaux, France, pp. 2187-2191. https://doi.org/10.1109/ICIP46576.2022.9897311
Yangxintong, L., Royen, R. D., & Munteanu, A. (2022). MONO6D: MONOCULAR VEHICLE 6D POSE ESTIMATION WITH 3D PRIORS. In IEEE International Conference on Image Processing (ICIP) (pp. 2187-2191). (Proceedings - International Conference on Image Processing, ICIP). IEEE. https://doi.org/10.1109/ICIP46576.2022.9897311
@inproceedings{7883f24f8b7c494b874ab5352733724c,
title = "MONO6D: MONOCULAR VEHICLE 6D POSE ESTIMATION WITH 3D PRIORS",
abstract = "Predicting the 6DoF pose of vehicles from a single view image without additional constraints remains an ill-posed problem. Current monocular approaches require expensive and time-consuming annotations of vehicle-specific feature points and/or the 2D-3D feature correspondences. In this paper, we propose a novel monocular approach for vehicle pose estimation in SE(3), dubbed Mono6D, that uses vehicle 3D priors provided by vehicle make-and-model recognition methods to estimate the 6D pose. The proposed method mainly consists of: 1) a two-separate-branch module to learn multi-modal representations; 2) a fusion schema to learn pose-specific representative embeddings. The experimental results show that the proposed method is superior to the state-of-the-art approaches in both objective and subjective terms.",
author = "Lyu Yangxintong and Royen, {Remco Donovan} and Adrian Munteanu",
note = "Funding Information: This work is funded by Innoviris within the research project DRIvINg and by the Fonds Wetenschappelijk Onderzoek (FWO) - 1S89420N. Publisher Copyright: {\textcopyright} 2022 IEEE. Copyright: Copyright 2023 Elsevier B.V., All rights reserved.",
year = "2022",
doi = "10.1109/ICIP46576.2022.9897311",
language = "English",
isbn = "978-1-6654-9621-6",
series = "Proceedings - International Conference on Image Processing, ICIP",
publisher = "IEEE",
pages = "2187--2191",
booktitle = "IEEE International Conference on Image Processing (ICIP)",
}