Open-Vocabulary And Multitask Image Segmentation

被引:0
|
作者
Pan, Lihu [1 ]
Yang, Yunting [1 ]
Wang, Zhengkui [2 ]
Shan, Wen [3 ]
Yin, Jaili [1 ]
机构
[1] Taiyuan Univ Sci & Technol, Taiyuan, Peoples R China
[2] Singapore Inst Technol, Infocomm Technol Cluster, Singapore, Singapore
[3] Singapore Univ Social Sci, Singapore, Singapore
关键词
Image segmentation; Adaptive prompt learning; Image-text fusion; Multitask;
D O I
10.1145/3605098.3636192
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Open-vocabulary learning has revolutionized image segmentation, enabling the delineation of arbitrary categories from textual descriptions. While current methods often employ specialized architectures, OVAMTSeg presents a unified framework for Open-Vocabulary and Multitask Image Segmentation. Leveraging adaptive prompt learning, OVAMTSeg excels in capturing category-sensitive concepts, ensuring robustness across diverse multi-task scenarios. Text prompts effectively capture semantic and contextual features, while cross-attention and cross-modal interactions facilitate seamless fusion of image and text features. The framework incorporates a transformer-based decoder for dense prediction. Experimental results demonstrate OVAMTSeg's effectiveness, achieving a 47.5 mIoU in referring expression segmentation, 51.6 mIoU on Pascal-VOC with four unseen classes, 46.6 mIoU on Pascal-Context in zero-shot segmentation, 65.9 mIoU on Pascal-5i, and 35.7 mIoU on COCO-20i datasets for one-shot segmentation.
引用
收藏
页码:1048 / 1049
页数:2
相关论文
共 50 条
  • [31] OV-VIS: Open-Vocabulary Video Instance Segmentation
    Wang, Haochen
    Yan, Cilin
    Chen, Keyan
    Jiang, Xiaolong
    Tang, Xu
    Hu, Yao
    Kang, Guoliang
    Xie, Weidi
    Gavves, Efstratios
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (11) : 5048 - 5065
  • [32] Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
    Liang, Feng
    Wu, Bichen
    Dai, Xiaoliang
    Li, Kunpeng
    Zhao, Yinan
    Zhang, Hang
    Zhang, Peizhao
    Vajda, Peter
    Marculescu, Diana
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7061 - 7070
  • [33] SAN: Side Adapter Network for Open-Vocabulary Semantic Segmentation
    Xu, Mengde
    Zhang, Zheng
    Wei, Fangyun
    Hu, Han
    Bai, Xiang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 15546 - 15561
  • [34] Expanding the Horizons: Exploring Further Steps in Open-Vocabulary Segmentation
    Wang, Xihua
    Ji, Lei
    Yan, Kun
    Sun, Yuchong
    Song, Ruihua
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT X, 2024, 14434 : 407 - 419
  • [35] OV-PARTS: Towards Open-Vocabulary Part Segmentation
    Wei, Meng
    Yue, Xiaoyu
    Zhang, Wenwei
    Kong, Shu
    Liu, Xihui
    Pang, Jiangmiao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [36] TAG: Guidance-Free Open-Vocabulary Semantic Segmentation
    Kawano, Yasufumi
    Aoki, Yoshimitsu
    IEEE ACCESS, 2024, 12 : 88322 - 88331
  • [37] LLMFormer: Large Language Model for Open-Vocabulary Semantic Segmentation
    Shi, Hengcan
    Dao, Son Duy
    Cai, Jianfei
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (02) : 742 - 759
  • [38] SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
    Xie, Bin
    Cao, Jiale
    Xie, Jin
    Khan, Fahad Shahbaz
    Pang, Yanwei
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 3426 - 3436
  • [39] Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation
    Jiao, Siyu
    Zhu, Hongguang
    Huang, Jiannan
    Zhao, Yao
    Wei, Yunchao
    Shi, Humphrey
    COMPUTER VISION - ECCV 2024, PT XXXIII, 2025, 15091 : 399 - 416
  • [40] Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network
    Han, Cong
    Zhong, Yujie
    Li, Dengjie
    Han, Kai
    Ma, Lin
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1086 - 1096