Bilateral Knowledge Interaction Network for Referring Image Segmentation

被引:7
|
作者
Ding, Haixin [1 ]
Zhang, Shengchuan [1 ]
Wu, Qiong [1 ]
Yu, Songlin [1 ]
Hu, Jie [1 ]
Cao, Liujuan [1 ]
Ji, Rongrong [1 ]
机构
[1] Xiamen Univ, Key Lab Multimedia Trusted Percept & Efficient Com, Minist Educ China, Xiamen 361005, Peoples R China
关键词
Image segmentation; Visualization; Kernel; Knowledge engineering; Feature extraction; Semantics; Convolution; Referring image segmentation; vision-language; AGGREGATION;
D O I
10.1109/TMM.2023.3305869
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Referring image segmentation aims to segment objects that are described by natural language expressions. Although remarkable advancements have been made to align natural language expressions with visual representations for better performance, the interaction between image-level and text-level information is still not formulated properly. Most of the previous works focus on building correlations between vision and language, ignoring the variety of objects. The target objects with unique appearances may not be correctly located or completely segmented. In this article, we propose a novel Bilateral Knowledge Interaction Network, termed BKINet, which reformulates the image-text interaction in a bilateral manner to adapt concrete knowledge of the target object in the image. BKINet contains two key components: a knowledge learning module (KLM) and a knowledge applying module (KAM). In the KLM, the abstract knowledge from text features is replenished with concrete knowledge from visual features to adapt to the target objects in the input images, which generates the knowledge interaction kernels (KI kernels) containing abundant referring information. With the referring information of KI kernels, the KAM is designed to highlight the most relevant visual features for predicting the accurate segmentation mask. Extensive experiments on three widely-used datasets, i.e. RefCOCO, RefCOCO+, and G-ref, demonstrate the superiority of BKINet over the state-of-the-art.
引用
收藏
页码:2966 / 2977
页数:12
相关论文
共 50 条
  • [1] Dual-graph hierarchical interaction network for referring image segmentation
    Shi, Zhaofeng
    Wu, Qingbo
    Li, Hongliang
    Meng, Fanman
    Ngan, King Ngi
    DISPLAYS, 2023, 80
  • [2] Recurrent Multimodal Interaction for Referring Image Segmentation
    Liu, Chenxi
    Lin, Zhe
    Shen, Xiaohui
    Yang, Jimei
    Lu, Xin
    Yuille, Alan
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1280 - 1289
  • [3] Structured Attention Network for Referring Image Segmentation
    Lin, Liang
    Yan, Pengxiang
    Xu, Xiaoqian
    Yang, Sibei
    Zeng, Kun
    Li, Guanbin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1922 - 1932
  • [4] Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation
    Liu, Sihan
    Ma, Yiwei
    Zhang, Xiaoqing
    Wang, Haowei
    Ji, Jiayi
    Sun, Xiaoshuai
    Ji, Rongrong
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 26648 - 26658
  • [5] Structured Multimodal Fusion Network for Referring Image Segmentation
    Xue, Mingcheng
    Liu, Yu
    Xu, Kaiping
    Zhang, Haiyang
    Yu, Chengyang
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, : 36 - 47
  • [6] Dual Convolutional LSTM Network for Referring Image Segmentation
    Ye, Linwei
    Liu, Zhi
    Wang, Yang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (12) : 3224 - 3235
  • [7] Query Reconstruction Network for Referring Expression Image Segmentation
    Shi, Hengcan
    Li, Hongliang
    Wu, Qingbo
    Ngan, King Ngi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 995 - 1007
  • [8] PRNet: A Progressive Refinement Network for referring image segmentation
    Liu, Jing
    Jiang, Huajie
    Hu, Yongli
    Yin, Baocai
    NEUROCOMPUTING, 2025, 630
  • [9] A CONTEXT-BASED NETWORK FOR REFERRING IMAGE SEGMENTATION
    Li, Xinyu
    Liu, Yu
    Xu, Kaiping
    Zhao, Zhehuan
    Liu, Sipei
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1436 - 1440
  • [10] Global Selection and Local Attention Network for Referring Image Segmentation
    Ding, Haixin
    Zhang, Shengchuan
    Cao, Liujuan
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VII, 2024, 14431 : 284 - 295