Purify Then Guide: A Bi-Directional Bridge Network for Open-Vocabulary Semantic Segmentation

被引:0
|
作者
Pan, Yuwen [1 ]
Sun, Rui [2 ]
Wang, Yuan
Yang, Wenfei [2 ,3 ]
Zhang, Tianzhu [2 ,3 ]
Zhang, Yongdong [2 ,4 ]
机构
[1] Univ Sci & Technol China, Sch Cyber Sci & Technol, Hefei 230027, Peoples R China
[2] Univ Sci & Technol China, Sch Informat Sci, Hefei 230027, Peoples R China
[3] Deep Space Explorat Lab, Hefei 230027, Peoples R China
[4] Peoples Daily Online, State Key Lab Commun Content Cognit, Beijing 100733, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantics; Vocabulary; Semantic segmentation; Reliability; Visualization; Proposals; Modulation; Open-vocabulary semantic segmentation; semantic purification; bi-directional guidance; reliable attention;
D O I
10.1109/TCSVT.2024.3464631
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Open-vocabulary semantic segmentation (OVSS) aims to segment an image into regions of corresponding semantic vocabularies, without being limited to a predefined set of object categories. Existing works mainly utilize large-scale vision-language models (e.g., CLIP) to leverage their superior open-vocabulary classification abilities in a two-stage manner. However, their heavy reliance on the first-stage segmentation network leaves the full potential of CLIP untapped, creating an unresolved gap between the rich pre-training knowledge and the challenging per-pixel classification task. Although the recent one-stage paradigm has further leveraged pre-trained vision knowledge from CLIP, it fails to effectively utilize text information due to the inclusion of numerous unrelated semantics in the vocabulary list. How to avoid noise interference in text information and utilize language guidance remains a Gordian knot. In this paper, we propose a bi-directional bridge network (BBN) to bridge the gap between upstream pre-trained models and downstream segmentation tasks. It first purifies the noisy text embedding and then guides semantics-vision aggregation with the purified information in a purification-then-guidance manner, thereby facilitating effective semantic utilization. Specifically, we design an optimal purification modulator to purify noisy text information via the optimal transport algorithm, and a reliable guidance modulator to integrate proper textual information into vision embedding via the designed reliable attention in an adaptive manner. Extensive experimental results on five challenging benchmarks demonstrate that our BBN performs favorably against state-of-the-art open-vocabulary semantic segmentation methods.
引用
收藏
页码:343 / 356
页数:14
相关论文
共 50 条
  • [1] Side Adapter Network for Open-Vocabulary Semantic Segmentation
    Xu, Mengde
    Zhang, Zheng
    Wei, Fangyun
    Hu, Han
    Bai, Xiang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2945 - 2954
  • [2] SAN: Side Adapter Network for Open-Vocabulary Semantic Segmentation
    Xu, Mengde
    Zhang, Zheng
    Wei, Fangyun
    Hu, Han
    Bai, Xiang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 15546 - 15561
  • [3] Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network
    Han, Cong
    Zhong, Yujie
    Li, Dengjie
    Han, Kai
    Ma, Lin
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1086 - 1096
  • [4] Open-Vocabulary RGB-Thermal Semantic Segmentation
    Zhao, Guoqiang
    Huang, Junjie
    Yan, Xiaoyun
    Wang, Zhaojing
    Tang, Junwei
    Ou, Yangjun
    Hu, Xinrong
    Peng, Tao
    COMPUTER VISION - ECCV 2024, PT LXXIV, 2025, 15132 : 304 - 320
  • [5] Open-Vocabulary Segmentation with Semantic-Assisted Calibration
    Liu, Yong
    Bai, Sule
    Li, Guanbin
    Wang, Yitong
    Tang, Yansong
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 3491 - 3500
  • [6] Enhancing Open-Vocabulary Semantic Segmentation with Prototype Retrieval
    Barsellotti, Luca
    Amoroso, Roberto
    Baraldi, Lorenzo
    Cucchiara, Rita
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2023, PT II, 2023, 14234 : 196 - 208
  • [7] In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
    Kang, Dahyun
    Cho, Minsu
    COMPUTER VISION - ECCV 2024, PT XLI, 2025, 15099 : 143 - 164
  • [8] Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation
    Zhang, Fei
    Zhou, Tianfei
    Li, Boyang
    He, Hao
    Ma, Chaofan
    Zhang, Tianjiao
    Yao, Jiangchao
    Zhang, Ya
    Wang, Yanfeng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
    Liang, Feng
    Wu, Bichen
    Dai, Xiaoliang
    Li, Kunpeng
    Zhao, Yinan
    Zhang, Hang
    Zhang, Peizhao
    Vajda, Peter
    Marculescu, Diana
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7061 - 7070
  • [10] Image-text aggregation for open-vocabulary semantic segmentation
    Cheng, Shengyang
    Huang, Jianyong
    Wang, Xiaodong
    Huang, Lei
    Wei, Zhiqiang
    NEUROCOMPUTING, 2025, 630