Purify Then Guide: A Bi-Directional Bridge Network for Open-Vocabulary Semantic Segmentation

被引:0
|
作者
Pan, Yuwen [1 ]
Sun, Rui [2 ]
Wang, Yuan
Yang, Wenfei [2 ,3 ]
Zhang, Tianzhu [2 ,3 ]
Zhang, Yongdong [2 ,4 ]
机构
[1] Univ Sci & Technol China, Sch Cyber Sci & Technol, Hefei 230027, Peoples R China
[2] Univ Sci & Technol China, Sch Informat Sci, Hefei 230027, Peoples R China
[3] Deep Space Explorat Lab, Hefei 230027, Peoples R China
[4] Peoples Daily Online, State Key Lab Commun Content Cognit, Beijing 100733, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantics; Vocabulary; Semantic segmentation; Reliability; Visualization; Proposals; Modulation; Open-vocabulary semantic segmentation; semantic purification; bi-directional guidance; reliable attention;
D O I
10.1109/TCSVT.2024.3464631
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Open-vocabulary semantic segmentation (OVSS) aims to segment an image into regions of corresponding semantic vocabularies, without being limited to a predefined set of object categories. Existing works mainly utilize large-scale vision-language models (e.g., CLIP) to leverage their superior open-vocabulary classification abilities in a two-stage manner. However, their heavy reliance on the first-stage segmentation network leaves the full potential of CLIP untapped, creating an unresolved gap between the rich pre-training knowledge and the challenging per-pixel classification task. Although the recent one-stage paradigm has further leveraged pre-trained vision knowledge from CLIP, it fails to effectively utilize text information due to the inclusion of numerous unrelated semantics in the vocabulary list. How to avoid noise interference in text information and utilize language guidance remains a Gordian knot. In this paper, we propose a bi-directional bridge network (BBN) to bridge the gap between upstream pre-trained models and downstream segmentation tasks. It first purifies the noisy text embedding and then guides semantics-vision aggregation with the purified information in a purification-then-guidance manner, thereby facilitating effective semantic utilization. Specifically, we design an optimal purification modulator to purify noisy text information via the optimal transport algorithm, and a reliable guidance modulator to integrate proper textual information into vision embedding via the designed reliable attention in an adaptive manner. Extensive experimental results on five challenging benchmarks demonstrate that our BBN performs favorably against state-of-the-art open-vocabulary semantic segmentation methods.
引用
收藏
页码:343 / 356
页数:14
相关论文
共 50 条
  • [21] Open-vocabulary Panoptic Segmentation with Embedding Modulation
    Chen, Xi
    Li, Shuang
    Lim, Ser-Nam
    Torralba, Antonio
    Zhao, Hengshuang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1141 - 1150
  • [22] Generalization Boosted Adapter for Open-Vocabulary Segmentation
    Xu, Wenhao
    Wang, Changwei
    Feng, Xuxiang
    Xu, Rongtao
    Huang, Longzhao
    Zhang, Zherui
    Guo, Li
    Xu, Shibiao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 520 - 533
  • [23] Open-vocabulary Object Segmentation with Diffusion Models
    Li, Ziyi
    Zhou, Qinye
    Zhang, Xiaoyun
    Zhang, Ya
    Wang, Yanfeng
    Xie, Weidi
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7633 - 7642
  • [24] Going Denser with Open-Vocabulary Part Segmentation
    Sun, Peize
    Chen, Shoufa
    Zhu, Chenchen
    Xiao, Fanyi
    Luo, Ping
    Xie, Saining
    Yan, Zhicheng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15407 - 15419
  • [25] Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only
    Chen, Jun
    Zhu, Deyao
    Qian, Guocheng
    Ghanem, Bernard
    Yan, Zhicheng
    Zhu, Chenchen
    Xiao, Fanyi
    Culatana, Sean Chang
    Elhoseiny, Mohamed
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 699 - 710
  • [26] Towards Open-Vocabulary Video Instance Segmentation
    Wang, Haochen
    Yan, Cilin
    Wang, Shuai
    Jiang, Xiaolong
    Tang, Xu
    Hu, Yao
    Xie, Weidi
    Gavves, Efstratios
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 4034 - 4043
  • [27] MasQCLIP for Open-Vocabulary Universal Image Segmentation
    Xu, Xin
    Xiong, Tianyi
    Ding, Zheng
    Tu, Zhuowen
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 887 - 898
  • [28] Hierarchical Open-vocabulary Universal Image Segmentation
    Wang, Xudong
    Li, Shufan
    Kallidromitis, Konstantinos
    Kato, Yusuke
    Kozuka, Kazuki
    Darrell, Trevor
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [29] A Simple Framework for Open-Vocabulary Segmentation and Detection
    Zhang, Hao
    Li, Feng
    Zou, Xueyan
    Liu, Shilong
    Li, Chunyuan
    Yang, Jianwei
    Zhang, Lei
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1020 - 1031
  • [30] A bi-directional deep learning architecture for lung nodule semantic segmentation
    Debnath Bhattacharyya
    N. Thirupathi Rao
    Eali Stephen Neal Joshua
    Yu-Chen Hu
    The Visual Computer, 2023, 39 : 5245 - 5261