Publication Details
Overview
 
 
Diana-Alexandra Sas, Leandro Di Bella, Yangxintong Lyu, Florin Oniga, Adrian Munteanu
 

Chapter in Book/ Report/ Conference proceeding

Abstract 

Since the introduction of the self-attention mech-anism and the adoption of the Transformer architecture forComputer Vision tasks, the Vision Transformer-based archi-tectures gained a lot of popularity in the field, being usedfor tasks such as image classification, object detection andimage segmentation. However, efficiently leveraging the attentionmechanism in vision transformers for the Monocular 3D ObjectDetection task remains an open question. In this paper, wepresent LAM3D, a framework that Leverages self-Attentionmechanism for Monocular 3D object Detection. To do so, theproposed method is built upon a Pyramid Vision Transformerv2 (PVTv2) as feature extraction backbone and 2D/3D detectionmachinery. We evaluate the proposed method on the KITTI3D Object Detection Benchmark, proving the applicability ofthe proposed solution in the autonomous driving domain andoutperforming reference methods. Moreover, due to the usage ofself-attention, LAM3D is able to systematically outperform theequivalent architecture that does not employ self-attention.

Reference 
 
 
DOI  DOI