MatrixVT: Efficient Multi-Camera to BEV Transformation for 3D Perception

被引:8
|
作者
Zhou, Hongyu [1 ]
Ge, Zheng [1 ]
Li, Zeming [1 ]
Zhang, Xiangyu [1 ]
机构
[1] MEGVII Technol, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
D O I
10.1109/ICCV51070.2023.00785
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes an efficient multi-camera to Bird's-Eye-View (BEV) view transformation method for 3D perception, dubbed MatrixVT. Existing view transformers either suffer from poor efficiency or rely on device-specific operators, hindering the broad application of BEV models. In contrast, our method generates BEV features efficiently with only convolutions and matrix multiplications (MatMul). Specifically, we propose describing the BEV feature as the MatMul of image feature and a sparse Feature Transporting Matrix (FTM). A Prime Extraction module is then introduced to compress the dimension of image features and reduce FTM's sparsity. Moreover, we propose the Ring & Ray Decomposition to replace the FTM with two matrices and reformulate our pipeline to reduce calculation further. Compared to existing methods, MatrixVT enjoys a faster speed and less memory footprint while remaining deploy-friendly. Extensive experiments on nuScenes and Waymo benchmarks demonstrate that our method is highly efficient but obtains results on par with the SOTA method in object detection and map segmentation tasks.
引用
收藏
页码:8514 / 8523
页数:10
相关论文
共 50 条
  • [31] Evaluation of RGB-D Multi-Camera Pose Estimation for 3D Reconstruction
    de Medeiros Esper, Ian
    Smolkin, Oleh
    Manko, Maksym
    Popov, Anton
    From, Pal Johan
    Mason, Alex
    APPLIED SCIENCES-BASEL, 2022, 12 (09):
  • [32] Joint Camera Pose Estimation and 3D Human Pose Estimation in a Multi-camera Setup
    Puwein, Jens
    Ballan, Luca
    Ziegler, Remo
    Pollefeys, Marc
    COMPUTER VISION - ACCV 2014, PT II, 2015, 9004 : 473 - 487
  • [33] Evaluating the Fuzzy Coverage Model for 3D Multi-camera Network Applications
    Mavrinac, Aaron
    Herrera, Jose Luis Alarcon
    Chen, Xiang
    INTELLIGENT ROBOTICS AND APPLICATIONS, PT I, 2010, 6424 : 692 - 701
  • [34] 3D SHAPE FROM MULTI-CAMERA VIEWS BY ERROR PROJECTION MINIMIZATION
    Haro, Gloria
    Pardas, Montse
    2009 10TH INTERNATIONAL WORKSHOP ON IMAGE ANALYSIS FOR MULTIMEDIA INTERACTIVE SERVICES, 2009, : 250 - +
  • [35] Multi-camera Photometric Simulation for Creation of 3D Object Reconstruction System
    Sobel, Dawid
    Jedrasiak, Karol
    Nawrat, Aleksander
    COMPUTER VISION AND GRAPHICS ( ICCVG 2018), 2018, 11114 : 187 - 198
  • [36] A Hybrid Optimization Approach for 3D Multi-Camera Human Pose Estimation
    Eguchi, Masatoshi
    Obo, Takenori
    Kubota, Naoyuki
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2024, 28 (06) : 1344 - 1353
  • [37] Multi-Camera Multiple 3D Object Tracking on the Move for Autonomous Vehicles
    Pha Nguyen
    Kha Gia Quach
    Chi Nhan Duong
    Ngan Le
    Xuan-Bac Nguyen
    Khoa Luu
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 2568 - 2577
  • [38] Multi-Camera 3D Position Estimation using Conditional Random Field
    Matsuda, Shusuke
    Techasarntikul, Nattaon
    Shimonishi, Hideyuki
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 1900 - 1908
  • [39] A Multi-Camera 3D Human Geometric Reconstruction Approach for Anthropometrical Representation
    Ong, Alex
    Quah, Chee Kwang
    Koh, Michael
    PROCEEDINGS OF FIRST JOINT INTERNATIONAL PRE-OLYMPIC CONFERENCE OF SPORTS SCIENCE AND SPORTS ENGINEERING, VOL I: COMPUTER SCIENCE IN SPORTS, 2008, : 339 - 343
  • [40] Multi-camera Finger Tracking and 3D Trajectory Reconstruction for HCI Studies
    Lyubanenko, Vadim
    Kuronen, Toni
    Eerola, Tuomas
    Lensu, Lasse
    Kalviainen, Heikki
    Hakkinen, Jukka
    ADVANCED CONCEPTS FOR INTELLIGENT VISION SYSTEMS (ACIVS 2017), 2017, 10617 : 63 - 74