MatrixVT: Efficient Multi-Camera to BEV Transformation for 3D Perception

被引：8

作者：

Zhou, Hongyu ^{[1
]}

Ge, Zheng ^{[1
]}

Li, Zeming ^{[1
]}

Zhang, Xiangyu ^{[1
]}

机构：

[1] MEGVII Technol, Beijing, Peoples R China

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023) | 2023年

基金：

国家重点研发计划;

关键词：

D O I：

10.1109/ICCV51070.2023.00785

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes an efficient multi-camera to Bird's-Eye-View (BEV) view transformation method for 3D perception, dubbed MatrixVT. Existing view transformers either suffer from poor efficiency or rely on device-specific operators, hindering the broad application of BEV models. In contrast, our method generates BEV features efficiently with only convolutions and matrix multiplications (MatMul). Specifically, we propose describing the BEV feature as the MatMul of image feature and a sparse Feature Transporting Matrix (FTM). A Prime Extraction module is then introduced to compress the dimension of image features and reduce FTM's sparsity. Moreover, we propose the Ring & Ray Decomposition to replace the FTM with two matrices and reformulate our pipeline to reduce calculation further. Compared to existing methods, MatrixVT enjoys a faster speed and less memory footprint while remaining deploy-friendly. Extensive experiments on nuScenes and Waymo benchmarks demonstrate that our method is highly efficient but obtains results on par with the SOTA method in object detection and map segmentation tasks.

引用

页码：8514 / 8523

页数：10

共 50 条

[31] Evaluation of RGB-D Multi-Camera Pose Estimation for 3D Reconstruction
de Medeiros Esper, Ian
Smolkin, Oleh
Manko, Maksym
Popov, Anton
From, Pal Johan
Mason, Alex
APPLIED SCIENCES-BASEL, 2022, 12 (09):
[32] Joint Camera Pose Estimation and 3D Human Pose Estimation in a Multi-camera Setup
Puwein, Jens
Ballan, Luca
Ziegler, Remo
Pollefeys, Marc
COMPUTER VISION - ACCV 2014, PT II, 2015, 9004 : 473 - 487
[33] Evaluating the Fuzzy Coverage Model for 3D Multi-camera Network Applications
Mavrinac, Aaron
Herrera, Jose Luis Alarcon
Chen, Xiang
INTELLIGENT ROBOTICS AND APPLICATIONS, PT I, 2010, 6424 : 692 - 701
[34] 3D SHAPE FROM MULTI-CAMERA VIEWS BY ERROR PROJECTION MINIMIZATION
Haro, Gloria
Pardas, Montse
2009 10TH INTERNATIONAL WORKSHOP ON IMAGE ANALYSIS FOR MULTIMEDIA INTERACTIVE SERVICES, 2009, : 250 - +
[35] Multi-camera Photometric Simulation for Creation of 3D Object Reconstruction System
Sobel, Dawid
Jedrasiak, Karol
Nawrat, Aleksander
COMPUTER VISION AND GRAPHICS ( ICCVG 2018), 2018, 11114 : 187 - 198
[36] A Hybrid Optimization Approach for 3D Multi-Camera Human Pose Estimation
Eguchi, Masatoshi
Obo, Takenori
Kubota, Naoyuki
JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2024, 28 (06) : 1344 - 1353
[37] Multi-Camera Multiple 3D Object Tracking on the Move for Autonomous Vehicles
Pha Nguyen
Kha Gia Quach
Chi Nhan Duong
Ngan Le
Xuan-Bac Nguyen
Khoa Luu
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 2568 - 2577
[38] Multi-Camera 3D Position Estimation using Conditional Random Field
Matsuda, Shusuke
Techasarntikul, Nattaon
Shimonishi, Hideyuki
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 1900 - 1908
[39] A Multi-Camera 3D Human Geometric Reconstruction Approach for Anthropometrical Representation
Ong, Alex
Quah, Chee Kwang
Koh, Michael
PROCEEDINGS OF FIRST JOINT INTERNATIONAL PRE-OLYMPIC CONFERENCE OF SPORTS SCIENCE AND SPORTS ENGINEERING, VOL I: COMPUTER SCIENCE IN SPORTS, 2008, : 339 - 343
[40] Multi-camera Finger Tracking and 3D Trajectory Reconstruction for HCI Studies
Lyubanenko, Vadim
Kuronen, Toni
Eerola, Tuomas
Lensu, Lasse
Kalviainen, Heikki
Hakkinen, Jukka
ADVANCED CONCEPTS FOR INTELLIGENT VISION SYSTEMS (ACIVS 2017), 2017, 10617 : 63 - 74

← 1 2 3 4 5 →