BEVStereo: Enhancing Depth Estimation in Multi-View 3D Object Detection with Temporal Stereo

被引：0

作者：

Li, Yinhao ^{[1
,3
]}

Bao, Han ^{[2
,3
]}

Ge, Zheng ^{[4
]}

Yang, Jinrong ^{[5
]}

Sun, Jianjian ^{[4
]}

Li, Zeming ^{[4
]}

机构：

[1] Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing, Peoples R China

[2] Chinese Acad Sci, Inst Comp Technol, State Key Lab Processors, Beijing, Peoples R China

[3] Univ Chinese Acad Sci, Beijing, Peoples R China

[4] MEGVII Technol, Beijing, Peoples R China

[5] Huazhong Univ Sci & Technol, Wuhan, Peoples R China

来源：

THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Restricted by the ability of depth perception, all Multi-view 3D object detection methods fall into the bottleneck of depth accuracy. By constructing temporal stereo, depth estimation is quite reliable in indoor scenarios. However, there are two difficulties in directly integrating temporal stereo into outdoor multi-view 3D object detectors: 1) The construction of temporal stereos for all views results in high computing costs. 2) Unable to adapt to challenging outdoor scenarios. In this study, we propose an effective method for creating temporal stereo by dynamically determining the center and range of the temporal stereo. The most confident center is found using the EM algorithm. Numerous experiments on nuScenes have shown the BEVStereo's ability to deal with complex outdoor scenarios that other stereo-based methods are unable to handle. For the first time, a stereo-based approach shows superiority in scenarios like a static ego vehicle and moving objects. BEVStereo achieves the new state-of-the-art in the camera-only track of nuScenes dataset while maintaining memory efficiency. Codes have been released(1).

引用

页码：1486 / 1494

页数：9

共 50 条

[31] 3D Object Detection based on Multi-View Feature Point Matching
Yang, Tian
Sang, Xinzhu
Chen, Duo
Guo, Nan
Wang, Peng
Yu, Xunbo
Yan, Binbin
Wang, Kuiru
Yu, Chongxiu
AI IN OPTICS AND PHOTONICS (AOPC 2019), 2019, 11342
[32] AeDet: Azimuth-invariant Multi-view 3D Object Detection
Feng, Chengjian
Jie, Zequn
Zhong, Yujie
Chu, Xiangxiang
Ma, Lin
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21580 - 21588
[33] PETR: Position Embedding Transformation for Multi-view 3D Object Detection
Liu, Yingfei
Wang, Tiancai
Zhang, Xiangyu
Sun, Jian
COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 531 - 548
[34] Engineering Monitoring and Change Detection for Multi-View Stereo 3D Reconstruction Technology
Chang T.-R.
Lee L.-H.
Journal of the Chinese Institute of Civil and Hydraulic Engineering, 2019, 31 (04): : 337 - 350
[35] AMVFNet: Attentive Multi-View Fusion Network for 3D Object Detection
Huang, Yuxiao
Huang, Zhicong
Zhao, Jingwen
Hu, Haifeng
Chen, Dihu
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2025, 21 (01)
[36] Multi-view depth estimation based on multi-feature aggregation for 3D reconstruction
Zhang, Chi
Liang, Lingyu
Zhou, Jijun
Xu, Yong
COMPUTERS & GRAPHICS-UK, 2024, 122
[37] OPEN: Object-Wise Position Embedding for Multi-view 3D Object Detection
Hou, Jinghua
Wang, Tong
Ye, Xiaoqing
Liu, Zhe
Gong, Shi
Tan, Xiao
Ding, Errui
Wang, Jingdong
Bai, Xiang
COMPUTER VISION - ECCV 2024, PT XXVI, 2025, 15084 : 146 - 162
[38] OCBEV: Object-Centric BEV Transformer for Multi-View 3D Object Detection
Qi, Zhangyang
Wang, Jiaqi
Wu, Xiaoyang
Zhao, Hengshuang
2024 INTERNATIONAL CONFERENCE IN 3D VISION, 3DV 2024, 2024, : 1188 - 1197
[39] Cyclic Refiner: Object-Aware Temporal Representation Learning for Multi-view 3D Detection and Tracking
Guo, Mingzhe
Zhang, Zhipeng
Jing, Liping
He, Yuan
Wang, Ke
Fan, Heng
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (12) : 6184 - 6206
[40] Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction
Zong, Zhuofan
Jiang, Dongzhi
Song, Guanglu
Xue, Zeyue
Su, Jingyong
Li, Hongsheng
Liu, Yu
Proceedings of the IEEE International Conference on Computer Vision, 2023, : 3758 - 3767

← 1 2 3 4 5 →