Bilateral Knowledge Interaction Network for Referring Image Segmentation

被引:7
|
作者
Ding, Haixin [1 ]
Zhang, Shengchuan [1 ]
Wu, Qiong [1 ]
Yu, Songlin [1 ]
Hu, Jie [1 ]
Cao, Liujuan [1 ]
Ji, Rongrong [1 ]
机构
[1] Xiamen Univ, Key Lab Multimedia Trusted Percept & Efficient Com, Minist Educ China, Xiamen 361005, Peoples R China
关键词
Image segmentation; Visualization; Kernel; Knowledge engineering; Feature extraction; Semantics; Convolution; Referring image segmentation; vision-language; AGGREGATION;
D O I
10.1109/TMM.2023.3305869
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Referring image segmentation aims to segment objects that are described by natural language expressions. Although remarkable advancements have been made to align natural language expressions with visual representations for better performance, the interaction between image-level and text-level information is still not formulated properly. Most of the previous works focus on building correlations between vision and language, ignoring the variety of objects. The target objects with unique appearances may not be correctly located or completely segmented. In this article, we propose a novel Bilateral Knowledge Interaction Network, termed BKINet, which reformulates the image-text interaction in a bilateral manner to adapt concrete knowledge of the target object in the image. BKINet contains two key components: a knowledge learning module (KLM) and a knowledge applying module (KAM). In the KLM, the abstract knowledge from text features is replenished with concrete knowledge from visual features to adapt to the target objects in the input images, which generates the knowledge interaction kernels (KI kernels) containing abundant referring information. With the referring information of KI kernels, the KAM is designed to highlight the most relevant visual features for predicting the accurate segmentation mask. Extensive experiments on three widely-used datasets, i.e. RefCOCO, RefCOCO+, and G-ref, demonstrate the superiority of BKINet over the state-of-the-art.
引用
收藏
页码:2966 / 2977
页数:12
相关论文
共 50 条
  • [21] Mask Grounding for Referring Image Segmentation
    Chng, Yong Xien
    Zheng, Henry
    Han, Yizeng
    Qiu, Xuchong
    Huang, Gao
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 26563 - 26573
  • [22] Two-stage visual cues enhancement network for referring image segmentation
    Jiao, Yang
    Jie, Zequn
    Luo, Weixin
    Chen, Jingjing
    Jiang, Yu-Gang
    Wei, Xiaolin
    Ma, Lin
    arXiv, 2021,
  • [23] Cross-Modal Self-Attention Network for Referring Image Segmentation
    Ye, Linwei
    Rochan, Mrigank
    Liu, Zhi
    Wang, Yang
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10494 - 10503
  • [24] GENERATIVE ADVERSARIAL NETWORK INCLUDING REFERRING IMAGE SEGMENTATION FOR TEXT-GUIDED IMAGE MANIPULATION
    Watanabe, Yuto
    Togo, Ren
    Maeda, Keisuke
    Ogawa, Takahiro
    Haseyama, Miki
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4818 - 4822
  • [25] Two-stage Visual Cues Enhancement Network for Referring Image Segmentation
    Jiao, Yang
    Jie, Zequn
    Luo, Weixin
    Chen, Jingjing
    Jiang, Yu-Gang
    Wei, Xiaolin
    Ma, Lin
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1331 - 1340
  • [26] CMIRNet: Cross-Modal Interactive Reasoning Network for Referring Image Segmentation
    Xu, Mingzhu
    Xiao, Tianxiang
    Liu, Yutong
    Tang, Haoyu
    Hu, Yupeng
    Nie, Liqiang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (04) : 3234 - 3249
  • [27] SATR: Semantics-Aware Triadic Refinement network for referring image segmentation
    Xie, Jialong
    Liu, Jin
    Wang, Guoxiang
    Zhou, Fengyu
    KNOWLEDGE-BASED SYSTEMS, 2024, 284
  • [28] Prompt-guided bidirectional deep fusion network for referring image segmentation
    Wu, Junxian
    Zhang, Yujia
    Kampffmeyer, Michael
    Zhao, Xiaoguang
    NEUROCOMPUTING, 2025, 616
  • [29] Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation
    Feng, Guang
    Hu, Zhiwei
    Zhang, Lihe
    Lu, Huchuan
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15501 - 15510
  • [30] RRSIS: Referring Remote Sensing Image Segmentation
    Yuan, Zhenghang
    Mou, Lichao
    Hua, Yuansheng
    Zhu, Xiao Xiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 12