OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection

被引：0

作者：

Zhang, Hu ^{[1
]}

Ku, Jianhua ^{[4
]}

Tang, Tao ^{[5
]}

Sun, Haiyang ^{[6
]}

Huang, Xin ^{[2
]}

Huang, Zi ^{[2
]}

Yu, Kaicheng ^{[3
]}

机构：

[1] CSIRO DATA61, Sydney, NSW, Australia

[2] Univ Queensland, Brisbane, Qld, Australia

[3] Westlake Univ, Hangzhou, Peoples R China

[4] Alibaba, DAMO Acad, Beijing, Peoples R China

[5] Sun Yat Sen Univ, Shenzhen Campus, Shenzhen, Peoples R China

[6] LiAuto Inc, Beijing, Peoples R China

来源：

COMPUTER VISION - ECCV 2024, PT LXXXIV | 2025年 / 15142卷

关键词：

OpenSight; Open-vocabulary; 3D object detection; VOXELNET;

D O I：

10.1007/978-3-031-72907-2_1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Traditional LiDAR-based object detection research primarily focuses on closed-set scenarios, which falls short in complex real-world applications. Directly transferring existing 2D open-vocabulary models with some known LiDAR classes for open-vocabulary ability, however, tends to suffer from over-fitting problems: The obtained model will detect the known objects, even presented with a novel category. In this paper, we propose OpenSight, a more advanced 2D-3D modeling framework for LiDAR-based open-vocabulary detection. OpenSight utilizes 2D-3D geometric priors for the initial discernment and localization of generic objects, followed by a more specific semantic interpretation of the detected objects. The process begins by generating 2D boxes for generic objects from the accompanying camera images of LiDAR. These 2D boxes, together with LiDAR points, are then lifted back into the LiDAR space to estimate corresponding 3D boxes. For better generic object perception, our framework integrates both temporal and spatial-aware constraints. Temporal awareness correlates the predicted 3D boxes across consecutive timestamps, recalibrating the missed or inaccurate boxes. The spatial awareness randomly places some "precisely" estimated 3D boxes at varying distances, increasing the visibility of generic objects. To interpret the specific semantics of detected objects, we develop a cross-modal alignment and fusion module to first align 3D features with 2D image embeddings and then fuse the aligned 3D-2D features for semantic decoding. Our experiments indicate that our method establishes state-of-the-art open-vocabulary performance on widely used 3D detection benchmarks and effectively identifies objects for new categories of interest.

引用

页码：1 / 19

页数：19

共 50 条

[1] Simple Open-Vocabulary Object Detection
Minderer, Matthias
Gritsenko, Alexey
Stone, Austin
Neumann, Maxim
Weissenborn, Dirk
Dosovitskiy, Alexey
Mahendran, Aravindh
Arnab, Anurag
Dehghani, Mostafa
Shen, Zhuoran
Wang, Xiao
Zhai, Xiaohua
Kipf, Thomas
Houlsby, Neil
COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 728 - 755
[2] A Simple Framework for Open-Vocabulary Segmentation and Detection
Zhang, Hao
Li, Feng
Zou, Xueyan
Liu, Shilong
Li, Chunyuan
Yang, Jianwei
Zhang, Lei
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1020 - 1031
[3] Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection
Peng, Xingyu
Bai, Yan
Gao, Chen
Yang, Lirong
Xia, Fei
Mu, Beipeng
Wang, Xiaofei
Liu, Si
COMPUTER VISION-ECCV 2024, PT XXXV, 2025, 15093 : 367 - 384
[4] Open-Vocabulary Object Detection With an Open Corpus
Wang, Jiong
Zhang, Huiming
Hong, Haiwen
Jin, Xuan
He, Yuan
Xue, Hui
Zhao, Zhou
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6736 - 6746
[5] Scaling Open-Vocabulary Object Detection
Minderer, Matthias
Gritsenko, Alexey
Houlsby, Neil
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[6] Open-Vocabulary Object Detection Using Captions
Zareian, Alireza
Dela Rosa, Kevin
Hu, Derek Hao
Chang, Shih-Fu
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 14388 - 14397
[7] Weakly Supervised Open-Vocabulary Object Detection
Lin, Jianghang
Shen, Yunhang
Wang, Bingquan
Lin, Shaohui
Li, Ke
Cao, Liujuan
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3404 - 3412
[8] Simple Image-Level Classification Improves Open-Vocabulary Object Detection
Fang, Ruohuan
Pang, Guansong
Bai, Xiao
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1716 - 1725
[9] Aligning Bag of Regions for Open-Vocabulary Object Detection
Wu, Size
Zhang, Wenwei
Jin, Sheng
Liu, Wentao
Loy, Chen Change
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15254 - 15264
[10] Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
Wang, Luting
Liu, Yi
Du, Penghui
Ding, Zihan
Liao, Yue
Qi, Qiaosong
Chen, Biaolong
Liu, Si
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11186 - 11196

← 1 2 3 4 5 →