Fuse and Calibrate: A Bi-directional Vision-Language Guided Framework for Referring Image Segmentation

被引:0
|
作者
Yan, Yichen [1 ,2 ]
He, Xingjian [1 ]
Chen, Sihan [2 ]
Lu, Shichen [3 ]
Liu, Jing [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[3] Beihang Univ, Sch Comp Sci & Engn, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Referring Image Segmentation; Vision-Language Models; Fusion & Calibration;
D O I
10.1007/978-981-97-5612-4_27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Referring Image Segmentation (RIS) aims to segment an object described in natural language from an image, with the main challenge being a text-to-pixel correlation. Previous methods typically rely on single-modality features, such as vision or language features, to guide the multi-modal fusion process. However, this approach limits the interaction between vision and language, leading to a lack of fine-grained correlation between the language description and pixel-level details during the decoding process. In this paper, we introduce FCNet, a framework that employs a bi-directional guided fusion approach where both vision and language play guiding roles. Specifically, we use a vision-guided approach to conduct initial multi-modal fusion, obtaining multi-modal features that focus on key vision information. We then propose a language-guided calibration module to further calibrate these multi-modal features, ensuring they understand the context of the input sentence. This bi-directional vision-language guided approach produces higher-quality multi-modal features sent to the decoder, facilitating adaptive propagation of fine-grained semantic information from textual features to visual features. Experiments on RefCOCO, RefCOCO+, and G-Ref datasets with various backbones consistently show our approach outperforming state-of-the-art methods.
引用
收藏
页码:313 / 324
页数:12
相关论文
共 50 条
  • [21] Mask prior generation with language queries guided networks for referring image segmentation
    Zhou, Jinhao
    Xiao, Guoqiang
    Lew, Michael S.
    Wu, Song
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 253
  • [22] Bi-directional gradient labeling and registration for gray-scale image segmentation
    Ma, L
    Zhang, XP
    Si, J
    Abousleman, GP
    2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL 1, PROCEEDINGS, 2003, : 365 - 368
  • [23] Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation
    Xu, Zunnan
    Chen, Zhihong
    Zhang, Yong
    Song, Yibing
    Wan, Xiang
    Li, Guanbin
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 17457 - 17466
  • [24] Test-time bi-directional adaptation between image and model for robust segmentation
    Huang, Xiaoqiong
    Yang, Xin
    Dou, Haoran
    Huang, Yuhao
    Zhang, Li
    Liu, Zhendong
    Yan, Zhongnuo
    Liu, Lian
    Zou, Yuxin
    Hu, Xindi
    Gao, Rui
    Zhang, Yuanji
    Xiong, Yi
    Xue, Wufeng
    Ni, Dong
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2023, 233
  • [25] BiX-NAS: Searching Efficient Bi-directional Architecture for Medical Image Segmentation
    Wang, Xinyi
    Xiang, Tiange
    Zhang, Chaoyi
    Song, Yang
    Liu, Dongnan
    Huang, Heng
    Cai, Weidong
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT I, 2021, 12901 : 229 - 238
  • [26] 3D bi-directional transformer U-Net for medical image segmentation
    Fu, Xiyao
    Sun, Zhexian
    Tang, Haoteng
    Zou, Eric M.
    Huang, Heng
    Wang, Yong
    Zhan, Liang
    FRONTIERS IN BIG DATA, 2023, 5
  • [27] Spatial Prior-Guided Bi-Directional Cross-Attention Transformers for Tooth Instance Segmentation
    Li, Pengcheng
    Gao, Chenqiang
    Lian, Chunfeng
    Meng, Deyu
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (11) : 3936 - 3948
  • [28] Image enhancement with bi-directional normalization and color attention-guided generative adversarial networks
    Shan Liu
    Shihao Shan
    Guoqiang Xiao
    Xinbo Gao
    Song Wu
    International Journal of Multimedia Information Retrieval, 2024, 13
  • [29] BI-DIRECTIONAL NORMALIZATION AND COLOR ATTENTION-GUIDED GENERATIVE ADVERSARIAL NETWORK FOR IMAGE ENHANCEMENT
    Liu, Shan
    Xiao, Guoqiang
    Xu, Xiaohui
    Wu, Song
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2205 - 2209
  • [30] Image enhancement with bi-directional normalization and color attention-guided generative adversarial networks
    Liu, Shan
    Shan, Shihao
    Xiao, Guoqiang
    Gao, Xinbo
    Wu, Song
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2024, 13 (01)