Open-Vocabulary And Multitask Image Segmentation

被引：0

作者：

Pan, Lihu ^{[1
]}

Yang, Yunting ^{[1
]}

Wang, Zhengkui ^{[2
]}

Shan, Wen ^{[3
]}

Yin, Jaili ^{[1
]}

机构：

[1] Taiyuan Univ Sci & Technol, Taiyuan, Peoples R China

[2] Singapore Inst Technol, Infocomm Technol Cluster, Singapore, Singapore

[3] Singapore Univ Social Sci, Singapore, Singapore

来源：

39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024 | 2024年

关键词：

Image segmentation; Adaptive prompt learning; Image-text fusion; Multitask;

D O I：

10.1145/3605098.3636192

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Open-vocabulary learning has revolutionized image segmentation, enabling the delineation of arbitrary categories from textual descriptions. While current methods often employ specialized architectures, OVAMTSeg presents a unified framework for Open-Vocabulary and Multitask Image Segmentation. Leveraging adaptive prompt learning, OVAMTSeg excels in capturing category-sensitive concepts, ensuring robustness across diverse multi-task scenarios. Text prompts effectively capture semantic and contextual features, while cross-attention and cross-modal interactions facilitate seamless fusion of image and text features. The framework incorporates a transformer-based decoder for dense prediction. Experimental results demonstrate OVAMTSeg's effectiveness, achieving a 47.5 mIoU in referring expression segmentation, 51.6 mIoU on Pascal-VOC with four unseen classes, 46.6 mIoU on Pascal-Context in zero-shot segmentation, 65.9 mIoU on Pascal-5i, and 35.7 mIoU on COCO-20i datasets for one-shot segmentation.

引用

页码：1048 / 1049

页数：2

共 50 条

[41] Open-Vocabulary Text-Driven Human Image Generation
Zhang, Kaiduo
Sun, Muyi
Sun, Jianxin
Zhang, Kunbo
Sun, Zhenan
Tan, Tieniu
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (10) : 4379 - 4397
[42] CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
Cho, Seokju
Shin, Hoeseong
Hong, Sunghwan
Arnab, Anurag
Seo, Paul Hongsuck
Kim, Seungryong
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 4113 - 4123
[43] Open-vocabulary Attribute Detection
Bravo, Maria A.
Mittal, Sudhanshu
Ging, Simon
Brox, Thomas
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7041 - 7050
[44] Class Enhancement Losses With Pseudo Labels for Open-Vocabulary Semantic Segmentation
Dao, Son Duy
Shi, Hengcan
Phung, Dinh
Cai, Jianfei
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8442 - 8453
[45] Open-Vocabulary Object Detection With an Open Corpus
Wang, Jiong
Zhang, Huiming
Hong, Haiwen
Jin, Xuan
He, Yuan
Xue, Hui
Zhao, Zhou
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6736 - 6746
[46] Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision
Xu, Jilan
Hou, Junlin
Zhang, Yuejie
Feng, Rui
Wang, Yi
Qiao, Yu
Xie, Weidi
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2935 - 2944
[47] CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation
Zhu, Wenqi
Cao, Jiale
Xie, Jin
Yang, Shuangming
Pang, Yanwei
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1098 - 1110
[48] Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Yu, Qihang
He, Ju
Deng, Xueqing
Shen, Xiaohui
Chen, Liang-Chieh
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[49] Correction: Open-Vocabulary Text-Driven Human Image Generation
Kaiduo Zhang
Muyi Sun
Jianxin Sun
Kunbo Zhang
Zhenan Sun
Tieniu Tan
International Journal of Computer Vision, 2025, 133 (2) : 989 - 989
[50] Open-Vocabulary Models for Source Code
Karampatsis, Rafael-Michael
Babii, Hlib
Robbes, Romain
Sutton, Charles
Janes, Andrea
2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2020), 2020, : 294 - 295

← 1 2 3 4 5 →