Vision-Aware Language Reasoning for Referring Image Segmentation

被引:0
|
作者
Xu, Fayou [1 ]
Luo, Bing [1 ]
Zhang, Chao [2 ]
Xu, Li [3 ]
Pu, Mingxing [1 ]
Li, Bo [1 ]
机构
[1] Xihua Univ, Sch Comp & Software Engn, Chengdu 610039, Peoples R China
[2] Sichuan Police Coll, Key Lab Intelligent Policing, Luzhou 646000, Peoples R China
[3] Xihua Univ, Sch Sci, Chengdu 610039, Peoples R China
关键词
Referring image segmentation; Vision and language; Explainable language-structure reasoning;
D O I
10.1007/s11063-023-11377-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Referring image segmentation is a multimodal joint task that aims to segment linguistically indicated objects from images in paired expressions and images. However, the diversity of language annotations trends to result in semantic ambiguity, which makes the semantic representation of language feature encoding imprecise. Existing methods ignore the correction of language encoding module, so that the semantic error of language features cannot be improved in the subsequent process, resulting in semantic deviation. To this end, we propose a vision-aware language reasoning model. Intuitively, the segmentation result can be used to guide the reconstruction of language features, which could be expressed as a tree-structured recursive process. Specifically, we designed a language reasoning encoding module and a mask loopback optimization module to optimize the language encoding tree. The feature weights of tree nodes are learned through backpropagation. In order to overcome the problem that local language words and visual regions are easily introduced into noise regions in the traditional attention module, we use the global language prior information to calculate the importance of different words to further weight the visual region features, which could be embodied as language-aware vision attention module. Our experimental results on four benchmark datasets show that the proposed method achieves performance improvement.
引用
收藏
页码:11313 / 11331
页数:19
相关论文
共 50 条
  • [11] Text-Vision Relationship Alignment for Referring Image Segmentation
    Pu, Mingxing
    Luo, Bing
    Zhang, Chao
    Xu, Li
    Xu, Fayou
    Kong, Mingming
    NEURAL PROCESSING LETTERS, 2024, 56 (02)
  • [12] Text-Vision Relationship Alignment for Referring Image Segmentation
    Mingxing Pu
    Bing Luo
    Chao Zhang
    Li Xu
    Fayou Xu
    Mingming Kong
    Neural Processing Letters, 56
  • [13] Fuse and Calibrate: A Bi-directional Vision-Language Guided Framework for Referring Image Segmentation
    Yan, Yichen
    He, Xingjian
    Chen, Sihan
    Lu, Shichen
    Liu, Jing
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XI, ICIC 2024, 2024, 14872 : 313 - 324
  • [14] VLT: Vision-Language Transformer and Query Generation for Referring Segmentation
    Ding, Henghui
    Liu, Chang
    Wang, Suchen
    Jiang, Xudong
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7900 - 7916
  • [15] Referring Image Segmentation via Language-Driven Attention
    Chen, Ding-Jie
    Hsieh, He-Yen
    Liu, Tyng-Luh
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 13997 - 14003
  • [16] Calibration & Reconstruction: Deep Integrated Language for Referring Image Segmentation
    Yan, Yichen
    He, Xingjian
    Chen, Sihan
    Liu, Jing
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 451 - 459
  • [17] Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation
    Yang, Zhao
    Wang, Jiaqi
    Tang, Yansong
    Chen, Kai
    Zhao, Hengshuang
    Torr, Philip H. S.
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3222 - 3230
  • [18] Key-Word-Aware Network for Referring Expression Image Segmentation
    Shi, Hengcan
    Li, Hongliang
    Meng, Fanman
    Wu, Qingbo
    COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 : 38 - 54
  • [19] Cross-modal attention guided visual reasoning for referring image segmentation
    Zhang, Wenjing
    Hu, Mengnan
    Tan, Quange
    Zhou, Qianli
    Wang, Rong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (19) : 28853 - 28872
  • [20] CMIRNet: Cross-Modal Interactive Reasoning Network for Referring Image Segmentation
    Xu, Mingzhu
    Xiao, Tianxiang
    Liu, Yutong
    Tang, Haoyu
    Hu, Yupeng
    Nie, Liqiang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (04) : 3234 - 3249