Purify Then Guide: A Bi-Directional Bridge Network for Open-Vocabulary Semantic Segmentation

被引:0
|
作者
Pan, Yuwen [1 ]
Sun, Rui [2 ]
Wang, Yuan
Yang, Wenfei [2 ,3 ]
Zhang, Tianzhu [2 ,3 ]
Zhang, Yongdong [2 ,4 ]
机构
[1] Univ Sci & Technol China, Sch Cyber Sci & Technol, Hefei 230027, Peoples R China
[2] Univ Sci & Technol China, Sch Informat Sci, Hefei 230027, Peoples R China
[3] Deep Space Explorat Lab, Hefei 230027, Peoples R China
[4] Peoples Daily Online, State Key Lab Commun Content Cognit, Beijing 100733, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantics; Vocabulary; Semantic segmentation; Reliability; Visualization; Proposals; Modulation; Open-vocabulary semantic segmentation; semantic purification; bi-directional guidance; reliable attention;
D O I
10.1109/TCSVT.2024.3464631
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Open-vocabulary semantic segmentation (OVSS) aims to segment an image into regions of corresponding semantic vocabularies, without being limited to a predefined set of object categories. Existing works mainly utilize large-scale vision-language models (e.g., CLIP) to leverage their superior open-vocabulary classification abilities in a two-stage manner. However, their heavy reliance on the first-stage segmentation network leaves the full potential of CLIP untapped, creating an unresolved gap between the rich pre-training knowledge and the challenging per-pixel classification task. Although the recent one-stage paradigm has further leveraged pre-trained vision knowledge from CLIP, it fails to effectively utilize text information due to the inclusion of numerous unrelated semantics in the vocabulary list. How to avoid noise interference in text information and utilize language guidance remains a Gordian knot. In this paper, we propose a bi-directional bridge network (BBN) to bridge the gap between upstream pre-trained models and downstream segmentation tasks. It first purifies the noisy text embedding and then guides semantics-vision aggregation with the purified information in a purification-then-guidance manner, thereby facilitating effective semantic utilization. Specifically, we design an optimal purification modulator to purify noisy text information via the optimal transport algorithm, and a reliable guidance modulator to integrate proper textual information into vision embedding via the designed reliable attention in an adaptive manner. Extensive experimental results on five challenging benchmarks demonstrate that our BBN performs favorably against state-of-the-art open-vocabulary semantic segmentation methods.
引用
收藏
页码:343 / 356
页数:14
相关论文
共 50 条
  • [31] A bi-directional deep learning architecture for lung nodule semantic segmentation
    Bhattacharyya, Debnath
    Rao, N. Thirupathi
    Joshua, Eali Stephen Neal
    Hu, Yu-Chen
    VISUAL COMPUTER, 2023, 39 (11): : 5245 - 5261
  • [32] Adversarial Semantic Decoupling for Recognizing Open-Vocabulary Slots
    Yan, Yuanmeng
    He, Keqing
    Xu, Hong
    Liu, Sihong
    Meng, Fanyu
    Hu, Min
    Xu, Weiran
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 6070 - 6075
  • [33] FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation
    Qin, Jie
    Wu, Jie
    Yan, Pengxiang
    Li, Ming
    Ren Yuxi
    Xiao, Xuefeng
    Wang, Yitong
    Wang, Rui
    Wen, Shilei
    Pan, Xin
    Wang, Xingang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19446 - 19455
  • [34] Bi-directional Relationship Inferring Network for Referring Image Segmentation
    Hu, Zhiwei
    Feng, Guang
    Sun, Jiayu
    Zhang, Lihe
    Lu, Huchuan
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4423 - 4432
  • [35] Bi-Directional Seed Attention Network for Interactive Image Segmentation
    Song, Gwangmo
    Lee, Kyoung Mu
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 1540 - 1544
  • [36] LANGUAGE-DRIVEN OPEN-VOCABULARY 3D SEMANTIC SEGMENTATION WITH KNOWLEDGE DISTILLATION
    Wu, Yuting
    Han, Xian-Feng
    Xiao, Guoqiang
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 3320 - 3324
  • [37] Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
    Zhu, Xiaoyu
    Zhou, Hao
    Xing, Pengfei
    Zhao, Long
    Xu, Hao
    Liang, Junwei
    Hauptmann, Alexander
    Liu, Ting
    Gallagher, Andrew
    COMPUTER VISION - ECCV 2024, PT XXIX, 2025, 15087 : 357 - 375
  • [38] MVP-SEG: Multi-view Prompt Learning for Open-Vocabulary Semantic Segmentation
    Guo, Jie
    Wang, Qimeng
    Gao, Yan
    Jiang, Xiaolong
    Lin, Shaohui
    Zhang, Baochang
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT XII, 2024, 14436 : 158 - 171
  • [39] CLIP-DINOiser: Teaching CLIP a Few DINO Tricks for Open-Vocabulary Semantic Segmentation
    Wysoczanska, Monika
    Simeoni, Oriane
    Ramamonjisoa, Michael
    Bursuc, Andrei
    Trzcinski, Tomasz
    Perez, Patrick
    COMPUTER VISION - ECCV 2024, PT LXI, 2025, 15119 : 320 - 337
  • [40] Global Knowledge Calibration for Fast Open-Vocabulary Segmentation
    Han, Kunyang
    Liu, Yong
    Liew, Jun Hao
    Ding, Henghui
    Liu, Jiajun
    Wang, Yitong
    Tang, Yansong
    Yang, Yujiu
    Feng, Jiashi
    Zhao, Yao
    Wei, Yunchao
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 797 - 807