Open-Vocabulary And Multitask Image Segmentation

被引:0
|
作者
Pan, Lihu [1 ]
Yang, Yunting [1 ]
Wang, Zhengkui [2 ]
Shan, Wen [3 ]
Yin, Jaili [1 ]
机构
[1] Taiyuan Univ Sci & Technol, Taiyuan, Peoples R China
[2] Singapore Inst Technol, Infocomm Technol Cluster, Singapore, Singapore
[3] Singapore Univ Social Sci, Singapore, Singapore
关键词
Image segmentation; Adaptive prompt learning; Image-text fusion; Multitask;
D O I
10.1145/3605098.3636192
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Open-vocabulary learning has revolutionized image segmentation, enabling the delineation of arbitrary categories from textual descriptions. While current methods often employ specialized architectures, OVAMTSeg presents a unified framework for Open-Vocabulary and Multitask Image Segmentation. Leveraging adaptive prompt learning, OVAMTSeg excels in capturing category-sensitive concepts, ensuring robustness across diverse multi-task scenarios. Text prompts effectively capture semantic and contextual features, while cross-attention and cross-modal interactions facilitate seamless fusion of image and text features. The framework incorporates a transformer-based decoder for dense prediction. Experimental results demonstrate OVAMTSeg's effectiveness, achieving a 47.5 mIoU in referring expression segmentation, 51.6 mIoU on Pascal-VOC with four unseen classes, 46.6 mIoU on Pascal-Context in zero-shot segmentation, 65.9 mIoU on Pascal-5i, and 35.7 mIoU on COCO-20i datasets for one-shot segmentation.
引用
收藏
页码:1048 / 1049
页数:2
相关论文
共 50 条
  • [41] Open-Vocabulary Text-Driven Human Image Generation
    Zhang, Kaiduo
    Sun, Muyi
    Sun, Jianxin
    Zhang, Kunbo
    Sun, Zhenan
    Tan, Tieniu
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (10) : 4379 - 4397
  • [42] CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
    Cho, Seokju
    Shin, Hoeseong
    Hong, Sunghwan
    Arnab, Anurag
    Seo, Paul Hongsuck
    Kim, Seungryong
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 4113 - 4123
  • [43] Open-vocabulary Attribute Detection
    Bravo, Maria A.
    Mittal, Sudhanshu
    Ging, Simon
    Brox, Thomas
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7041 - 7050
  • [44] Class Enhancement Losses With Pseudo Labels for Open-Vocabulary Semantic Segmentation
    Dao, Son Duy
    Shi, Hengcan
    Phung, Dinh
    Cai, Jianfei
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8442 - 8453
  • [45] Open-Vocabulary Object Detection With an Open Corpus
    Wang, Jiong
    Zhang, Huiming
    Hong, Haiwen
    Jin, Xuan
    He, Yuan
    Xue, Hui
    Zhao, Zhou
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6736 - 6746
  • [46] Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision
    Xu, Jilan
    Hou, Junlin
    Zhang, Yuejie
    Feng, Rui
    Wang, Yi
    Qiao, Yu
    Xie, Weidi
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2935 - 2944
  • [47] CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation
    Zhu, Wenqi
    Cao, Jiale
    Xie, Jin
    Yang, Shuangming
    Pang, Yanwei
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1098 - 1110
  • [48] Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
    Yu, Qihang
    He, Ju
    Deng, Xueqing
    Shen, Xiaohui
    Chen, Liang-Chieh
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [49] Correction: Open-Vocabulary Text-Driven Human Image Generation
    Kaiduo Zhang
    Muyi Sun
    Jianxin Sun
    Kunbo Zhang
    Zhenan Sun
    Tieniu Tan
    International Journal of Computer Vision, 2025, 133 (2) : 989 - 989
  • [50] Open-Vocabulary Models for Source Code
    Karampatsis, Rafael-Michael
    Babii, Hlib
    Robbes, Romain
    Sutton, Charles
    Janes, Andrea
    2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2020), 2020, : 294 - 295