A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

被引：2

作者：

Zhu, Chaoyang ^{[1
]}

Chen, Long ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Kowloon, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2024年 / 46卷 / 12期

关键词：

Open-vocabulary; zero-shot learning; object detection; image segmentation; future directions; OBJECT; LANGUAGE;

D O I：

10.1109/TPAMI.2024.3413013

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As the most fundamental scene understanding tasks, object detection and segmentation have made tremendous progress in deep learning era. Due to the expensive manual labeling cost, the annotated categories in existing datasets are often small-scale and pre-defined, i.e., state-of-the-art fully-supervised detectors and segmentors fail to generalize beyond the closed vocabulary. To resolve this limitation, in the last few years, the community has witnessed an increasing attention toward Open-Vocabulary Detection (OVD) and Segmentation (OVS). By "open-vocabulary", we mean that the models can classify objects beyond pre-defined categories. In this survey, we provide a comprehensive review on recent developments of OVD and OVS. A taxonomy is first developed to organize different tasks and methodologies. We find that the permission and usage of weak supervision signals can well discriminate different methodologies, including: visual-semantic space mapping, novel visual feature synthesis, region-aware training, pseudo-labeling, knowledge distillation, and transfer learning. The proposed taxonomy is universal across different tasks, covering object detection, semantic/instance/panoptic segmentation, 3D and video understanding. The main design principles, key challenges, development routes, methodology strengths, and weaknesses are thoroughly analyzed.

引用

页码：8954 / 8975

页数：22

共 50 条

[41] SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
Xie, Bin
Cao, Jiale
Xie, Jin
Khan, Fahad Shahbaz
Pang, Yanwei
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 3426 - 3436
[42] Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation
Jiao, Siyu
Zhu, Hongguang
Huang, Jiannan
Zhao, Yao
Wei, Yunchao
Shi, Humphrey
COMPUTER VISION - ECCV 2024, PT XXXIII, 2025, 15091 : 399 - 416
[43] Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network
Han, Cong
Zhong, Yujie
Li, Dengjie
Han, Kai
Ma, Lin
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1086 - 1096
[44] Scaling Open-Vocabulary Image Segmentation with Image-Level Labels
Ghiasi, Golnaz
Gu, Xiuye
Cui, Yin
Lin, Tsung-Yi
COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 540 - 557
[45] CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
Cho, Seokju
Shin, Hoeseong
Hong, Sunghwan
Arnab, Anurag
Seo, Paul Hongsuck
Kim, Seungryong
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 4113 - 4123
[46] How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection
Yao, Yiyang
Liu, Peng
Zhao, Tiancheng
Zhang, Qianqian
Liao, Jiajia
Fang, Chunxin
Lee, Kyusong
Wang, Qing
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 6630 - 6638
[47] Class Enhancement Losses With Pseudo Labels for Open-Vocabulary Semantic Segmentation
Dao, Son Duy
Shi, Hengcan
Phung, Dinh
Cai, Jianfei
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8442 - 8453
[48] Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
Xu, Jiarui
Liu, Sifei
Vahdat, Arash
Byeon, Wonmin
Wang, Xiaolong
De Meo, Shalini
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2955 - 2966
[49] A survey on face detection in the wild: Past, present and future
Zafeiriou, Stefanos
Zhang, Cha
Zhang, Zhengyou
COMPUTER VISION AND IMAGE UNDERSTANDING, 2015, 138 : 1 - 24
[50] SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection
Liu, Mingxuan
Hayes, Tyler L.
Ricci, Elisa
Csurka, Gabriela
Volpi, Riccardo
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 16634 - 16644

← 1 2 3 4 5 →