Text-Vision Relationship Alignment for Referring Image Segmentation

被引:0
|
作者
Pu, Mingxing [1 ]
Luo, Bing [1 ]
Zhang, Chao [2 ]
Xu, Li [3 ]
Xu, Fayou [1 ]
Kong, Mingming [1 ]
机构
[1] Xihua Univ, Sch Comp & Software Engn, Chengdu 610039, Peoples R China
[2] Sichuan Police Coll, Key Lab Intelligent Policing, Luzhou 646000, Peoples R China
[3] Xihua Univ, Sch Sci, Chengdu 610039, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantic parsing; Text-vision alignment; Referring image segmentation;
D O I
10.1007/s11063-024-11487-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Referring image segmentation aims to segment object in an image based on a referring expression. Its difficulty lies in aligning expression semantics with visual instances. The existing methods based on semantic reasoning are limited by the performance of external syntax parser and do not explicitly explore the relationships between visual instances. This article proposes an end-to-end method for referring image segmentation by aligning 'linguistic relationship' with 'visual relationships'. This method does not rely on external syntax parser for expression parsing. In this paper, the expression is adaptively and structurally parsed into three components: 'subject', 'object', and 'linguistic relationship' by the Semantic Component Parser (SCP) in a learnable manner. Instances Activation Map Module (IAM) locates multiple visual instances based on the subject and object. In addition, the Relationship Based Visual Localization Module (RBVL) firstly enables each instance of the image to learn global knowledge, then decodes the visual relationships between these visual instances, and finally aligns the visual relationships with the linguistic relationships to further accurately locate the target object. The experimental results show that the proposed method improves performance by 4- 9% compared with baseline method on multiple referring image segmentation datasets.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] SLViT: Scale-Wise Language-Guided Vision Transformer for Referring Image Segmentation
    Ouyang, Shuyi
    Wang, Hongyi
    Xie, Shiao
    Niu, Ziwei
    Tong, Ruofeng
    Chen, Yen-Wei
    Lin, Lanfen
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 1294 - 1302
  • [22] SMVT: Spectrum-Driven Multi-scale Vision Transformer for Referring Image Segmentation
    Li, Tianxiao
    Chen, Junhong
    Huang, Yiheng
    Huang, Kesi
    Xia, Qiqiang
    Asim, Muhammad
    Liu, Wenyin
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT VI, ICIC 2024, 2024, 14867 : 193 - 206
  • [23] Unambiguous Scene Text Segmentation With Referring Expression Comprehension
    Rong, Xuejian
    Yi, Chucai
    Tian, Yingli
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 591 - 601
  • [24] DIAL: Dense Image-Text ALignment for Weakly Supervised Semantic Segmentation
    Jang, Soojin
    Yun, Jungmin
    Kwon, Junehyoung
    Lee, Eunju
    Kim, Youngbin
    COMPUTER VISION - ECCV 2024, PT LXIX, 2025, 15127 : 248 - 266
  • [25] Text-Guided Image Manipulation via Generative Adversarial Network With Referring Image Segmentation-Based Guidance
    Watanabe, Yuto
    Togo, Ren
    Maeda, Keisuke
    Ogawa, Takahiro
    Haseyama, Miki
    IEEE ACCESS, 2023, 11 : 42534 - 42545
  • [26] RRSIS: Referring Remote Sensing Image Segmentation
    Yuan, Zhenghang
    Mou, Lichao
    Hua, Yuansheng
    Zhu, Xiao Xiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 12
  • [27] Distillation and Supplementation of Features for Referring Image Segmentation
    Tan, Zeyu
    Xu, Dahong
    Li, Xi
    Liu, Hong
    IEEE ACCESS, 2024, 12 : 171269 - 171279
  • [28] Image Segmentation With Language Referring Expression and Comprehension
    Sun, Jiaxing
    Li, Yujie
    Cai, Jintong
    Lu, Huimin
    Serikawa, Seiichi
    IEEE SENSORS JOURNAL, 2022, 22 (18) : 17406 - 17413
  • [29] Recurrent Multimodal Interaction for Referring Image Segmentation
    Liu, Chenxi
    Lin, Zhe
    Shen, Xiaohui
    Yang, Jimei
    Lu, Xin
    Yuille, Alan
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1280 - 1289
  • [30] Referring Image Segmentation by Generative Adversarial Learning
    Qiu, Shuang
    Zhao, Yao
    Jiao, Jianbo
    Wei, Yunchao
    Wei, Shikui
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (05) : 1333 - 1344