A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

被引：2

作者：

Zhu, Chaoyang ^{[1
]}

Chen, Long ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Kowloon, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2024年 / 46卷 / 12期

关键词：

Open-vocabulary; zero-shot learning; object detection; image segmentation; future directions; OBJECT; LANGUAGE;

D O I：

10.1109/TPAMI.2024.3413013

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As the most fundamental scene understanding tasks, object detection and segmentation have made tremendous progress in deep learning era. Due to the expensive manual labeling cost, the annotated categories in existing datasets are often small-scale and pre-defined, i.e., state-of-the-art fully-supervised detectors and segmentors fail to generalize beyond the closed vocabulary. To resolve this limitation, in the last few years, the community has witnessed an increasing attention toward Open-Vocabulary Detection (OVD) and Segmentation (OVS). By "open-vocabulary", we mean that the models can classify objects beyond pre-defined categories. In this survey, we provide a comprehensive review on recent developments of OVD and OVS. A taxonomy is first developed to organize different tasks and methodologies. We find that the permission and usage of weak supervision signals can well discriminate different methodologies, including: visual-semantic space mapping, novel visual feature synthesis, region-aware training, pseudo-labeling, knowledge distillation, and transfer learning. The proposed taxonomy is universal across different tasks, covering object detection, semantic/instance/panoptic segmentation, 3D and video understanding. The main design principles, key challenges, development routes, methodology strengths, and weaknesses are thoroughly analyzed.

引用

页码：8954 / 8975

页数：22

共 50 条

[1] A Simple Framework for Open-Vocabulary Segmentation and Detection
Zhang, Hao
Li, Feng
Zou, Xueyan
Liu, Shilong
Li, Chunyuan
Yang, Jianwei
Zhang, Lei
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1020 - 1031
[2] Open-Vocabulary And Multitask Image Segmentation
Pan, Lihu
Yang, Yunting
Wang, Zhengkui
Shan, Wen
Yin, Jaili
39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 1048 - 1049
[3] Diffusion Models for Open-Vocabulary Segmentation
Karazija, Laurynas
Laina, Iro
Vedaldi, Andrea
Rupprecht, Christian
COMPUTER VISION - ECCV 2024, PT V, 2025, 15063 : 299 - 317
[4] Open-Vocabulary Camouflaged Object Segmentation
Pang, Youwei
Zhao, Xiaoqi
Zuo, Jiaming
Zhang, Lihe
Lu, Huchuan
COMPUTER VISION - ECCV 2024, PT XLVII, 2025, 15105 : 476 - 495
[5] Open-vocabulary Attribute Detection
Bravo, Maria A.
Mittal, Sudhanshu
Ging, Simon
Brox, Thomas
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7041 - 7050
[6] Open-vocabulary Panoptic Segmentation with Embedding Modulation
Chen, Xi
Li, Shuang
Lim, Ser-Nam
Torralba, Antonio
Zhao, Hengshuang
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1141 - 1150
[7] Generalization Boosted Adapter for Open-Vocabulary Segmentation
Xu, Wenhao
Wang, Changwei
Feng, Xuxiang
Xu, Rongtao
Huang, Longzhao
Zhang, Zherui
Guo, Li
Xu, Shibiao
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 520 - 533
[8] Open-vocabulary Object Segmentation with Diffusion Models
Li, Ziyi
Zhou, Qinye
Zhang, Xiaoyun
Zhang, Ya
Wang, Yanfeng
Xie, Weidi
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7633 - 7642
[9] Open-Vocabulary Object Detection With an Open Corpus
Wang, Jiong
Zhang, Huiming
Hong, Haiwen
Jin, Xuan
He, Yuan
Xue, Hui
Zhao, Zhou
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6736 - 6746
[10] Going Denser with Open-Vocabulary Part Segmentation
Sun, Peize
Chen, Shoufa
Zhu, Chenchen
Xiao, Fanyi
Luo, Ping
Xie, Saining
Yan, Zhicheng
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15407 - 15419

← 1 2 3 4 5 →