A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

被引：2

作者：

Zhu, Chaoyang ^{[1
]}

Chen, Long ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Kowloon, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2024年 / 46卷 / 12期

关键词：

Open-vocabulary; zero-shot learning; object detection; image segmentation; future directions; OBJECT; LANGUAGE;

D O I：

10.1109/TPAMI.2024.3413013

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As the most fundamental scene understanding tasks, object detection and segmentation have made tremendous progress in deep learning era. Due to the expensive manual labeling cost, the annotated categories in existing datasets are often small-scale and pre-defined, i.e., state-of-the-art fully-supervised detectors and segmentors fail to generalize beyond the closed vocabulary. To resolve this limitation, in the last few years, the community has witnessed an increasing attention toward Open-Vocabulary Detection (OVD) and Segmentation (OVS). By "open-vocabulary", we mean that the models can classify objects beyond pre-defined categories. In this survey, we provide a comprehensive review on recent developments of OVD and OVS. A taxonomy is first developed to organize different tasks and methodologies. We find that the permission and usage of weak supervision signals can well discriminate different methodologies, including: visual-semantic space mapping, novel visual feature synthesis, region-aware training, pseudo-labeling, knowledge distillation, and transfer learning. The proposed taxonomy is universal across different tasks, covering object detection, semantic/instance/panoptic segmentation, 3D and video understanding. The main design principles, key challenges, development routes, methodology strengths, and weaknesses are thoroughly analyzed.

引用

页码：8954 / 8975

页数：22

共 50 条

[11] Towards Open-Vocabulary Video Instance Segmentation
Wang, Haochen
Yan, Cilin
Wang, Shuai
Jiang, Xiaolong
Tang, Xu
Hu, Yao
Xie, Weidi
Gavves, Efstratios
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 4034 - 4043
[12] MasQCLIP for Open-Vocabulary Universal Image Segmentation
Xu, Xin
Xiong, Tianyi
Ding, Zheng
Tu, Zhuowen
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 887 - 898
[13] Hierarchical Open-vocabulary Universal Image Segmentation
Wang, Xudong
Li, Shufan
Kallidromitis, Konstantinos
Kato, Yusuke
Kozuka, Kazuki
Darrell, Trevor
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[14] Scaling Open-Vocabulary Object Detection
Minderer, Matthias
Gritsenko, Alexey
Houlsby, Neil
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[15] Simple Open-Vocabulary Object Detection
Minderer, Matthias
Gritsenko, Alexey
Stone, Austin
Neumann, Maxim
Weissenborn, Dirk
Dosovitskiy, Alexey
Mahendran, Aravindh
Arnab, Anurag
Dehghani, Mostafa
Shen, Zhuoran
Wang, Xiao
Zhai, Xiaohua
Kipf, Thomas
Houlsby, Neil
COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 728 - 755
[16] Side Adapter Network for Open-Vocabulary Semantic Segmentation
Xu, Mengde
Zhang, Zheng
Wei, Fangyun
Hu, Han
Bai, Xiang
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2945 - 2954
[17] FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation
Qin, Jie
Wu, Jie
Yan, Pengxiang
Li, Ming
Ren Yuxi
Xiao, Xuefeng
Wang, Yitong
Wang, Rui
Wen, Shilei
Pan, Xin
Wang, Xingang
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19446 - 19455
[18] Open-Vocabulary RGB-Thermal Semantic Segmentation
Zhao, Guoqiang
Huang, Junjie
Yan, Xiaoyun
Wang, Zhaojing
Tang, Junwei
Ou, Yangjun
Hu, Xinrong
Peng, Tao
COMPUTER VISION - ECCV 2024, PT LXXIV, 2025, 15132 : 304 - 320
[19] Open-Vocabulary Segmentation with Semantic-Assisted Calibration
Liu, Yong
Bai, Sule
Li, Guanbin
Wang, Yitong
Tang, Yansong
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 3491 - 3500
[20] Enhancing Open-Vocabulary Semantic Segmentation with Prototype Retrieval
Barsellotti, Luca
Amoroso, Roberto
Baraldi, Lorenzo
Cucchiara, Rita
IMAGE ANALYSIS AND PROCESSING, ICIAP 2023, PT II, 2023, 14234 : 196 - 208

← 1 2 3 4 5 →