A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

被引：2

作者：

Zhu, Chaoyang ^{[1
]}

Chen, Long ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Kowloon, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2024年 / 46卷 / 12期

关键词：

Open-vocabulary; zero-shot learning; object detection; image segmentation; future directions; OBJECT; LANGUAGE;

D O I：

10.1109/TPAMI.2024.3413013

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As the most fundamental scene understanding tasks, object detection and segmentation have made tremendous progress in deep learning era. Due to the expensive manual labeling cost, the annotated categories in existing datasets are often small-scale and pre-defined, i.e., state-of-the-art fully-supervised detectors and segmentors fail to generalize beyond the closed vocabulary. To resolve this limitation, in the last few years, the community has witnessed an increasing attention toward Open-Vocabulary Detection (OVD) and Segmentation (OVS). By "open-vocabulary", we mean that the models can classify objects beyond pre-defined categories. In this survey, we provide a comprehensive review on recent developments of OVD and OVS. A taxonomy is first developed to organize different tasks and methodologies. We find that the permission and usage of weak supervision signals can well discriminate different methodologies, including: visual-semantic space mapping, novel visual feature synthesis, region-aware training, pseudo-labeling, knowledge distillation, and transfer learning. The proposed taxonomy is universal across different tasks, covering object detection, semantic/instance/panoptic segmentation, 3D and video understanding. The main design principles, key challenges, development routes, methodology strengths, and weaknesses are thoroughly analyzed.

引用

页码：8954 / 8975

页数：22

共 50 条

[31] Unified Embedding Alignment for Open-Vocabulary Video Instance Segmentation
Fang, Hao
Wu, Peng
Li, Yawei
Zhang, Xinxin
Lu, Xiankai
COMPUTER VISION - ECCV 2024, PT LXX, 2025, 15128 : 225 - 241
[32] OV-VIS: Open-Vocabulary Video Instance Segmentation
Wang, Haochen
Yan, Cilin
Chen, Keyan
Jiang, Xiaolong
Tang, Xu
Hu, Yao
Kang, Guoliang
Xie, Weidi
Gavves, Efstratios
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (11) : 5048 - 5065
[33] Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
Liang, Feng
Wu, Bichen
Dai, Xiaoliang
Li, Kunpeng
Zhao, Yinan
Zhang, Hang
Zhang, Peizhao
Vajda, Peter
Marculescu, Diana
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7061 - 7070
[34] SAN: Side Adapter Network for Open-Vocabulary Semantic Segmentation
Xu, Mengde
Zhang, Zheng
Wei, Fangyun
Hu, Han
Bai, Xiang
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 15546 - 15561
[35] Image-text aggregation for open-vocabulary semantic segmentation
Cheng, Shengyang
Huang, Jianyong
Wang, Xiaodong
Huang, Lei
Wei, Zhiqiang
NEUROCOMPUTING, 2025, 630
[36] Expanding the Horizons: Exploring Further Steps in Open-Vocabulary Segmentation
Wang, Xihua
Ji, Lei
Yan, Kun
Sun, Yuchong
Song, Ruihua
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT X, 2024, 14434 : 407 - 419
[37] OV-PARTS: Towards Open-Vocabulary Part Segmentation
Wei, Meng
Yue, Xiaoyu
Zhang, Wenwei
Kong, Shu
Liu, Xihui
Pang, Jiangmiao
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[38] TAG: Guidance-Free Open-Vocabulary Semantic Segmentation
Kawano, Yasufumi
Aoki, Yoshimitsu
IEEE ACCESS, 2024, 12 : 88322 - 88331
[39] LLMFormer: Large Language Model for Open-Vocabulary Semantic Segmentation
Shi, Hengcan
Dao, Son Duy
Cai, Jianfei
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (02) : 742 - 759
[40] Aligning Bag of Regions for Open-Vocabulary Object Detection
Wu, Size
Zhang, Wenwei
Jin, Sheng
Liu, Wentao
Loy, Chen Change
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15254 - 15264

← 1 2 3 4 5 →