Text-Vision Relationship Alignment for Referring Image Segmentation

被引:0
|
作者
Pu, Mingxing [1 ]
Luo, Bing [1 ]
Zhang, Chao [2 ]
Xu, Li [3 ]
Xu, Fayou [1 ]
Kong, Mingming [1 ]
机构
[1] Xihua Univ, Sch Comp & Software Engn, Chengdu 610039, Peoples R China
[2] Sichuan Police Coll, Key Lab Intelligent Policing, Luzhou 646000, Peoples R China
[3] Xihua Univ, Sch Sci, Chengdu 610039, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantic parsing; Text-vision alignment; Referring image segmentation;
D O I
10.1007/s11063-024-11487-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Referring image segmentation aims to segment object in an image based on a referring expression. Its difficulty lies in aligning expression semantics with visual instances. The existing methods based on semantic reasoning are limited by the performance of external syntax parser and do not explicitly explore the relationships between visual instances. This article proposes an end-to-end method for referring image segmentation by aligning 'linguistic relationship' with 'visual relationships'. This method does not rely on external syntax parser for expression parsing. In this paper, the expression is adaptively and structurally parsed into three components: 'subject', 'object', and 'linguistic relationship' by the Semantic Component Parser (SCP) in a learnable manner. Instances Activation Map Module (IAM) locates multiple visual instances based on the subject and object. In addition, the Relationship Based Visual Localization Module (RBVL) firstly enables each instance of the image to learn global knowledge, then decodes the visual relationships between these visual instances, and finally aligns the visual relationships with the linguistic relationships to further accurately locate the target object. The experimental results show that the proposed method improves performance by 4- 9% compared with baseline method on multiple referring image segmentation datasets.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Text-Vision Relationship Alignment for Referring Image Segmentation
    Mingxing Pu
    Bing Luo
    Chao Zhang
    Li Xu
    Fayou Xu
    Mingming Kong
    Neural Processing Letters, 56
  • [2] Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image Segmentation
    Lei, Sen
    Xiao, Xinyu
    Zhang, Tianlin
    Li, Heng-Chao
    Shi, Zhenwei
    Zhu, Qing
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [3] Referring Image Segmentation Using Text Supervision
    Liu, Fang
    Liu, Yuhao
    Kong, Yuqiu
    Xu, Ke
    Zhang, Lihe
    Yin, Baocai
    Hancke, Gerhard
    Lau, Rynson
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22067 - 22077
  • [4] Referring Image Segmentation Without Text Annotations
    Liu, Jing
    Jiang, Huajie
    Bi, Yandong
    Hu, Yongli
    Yin, Baocai
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024, 2024, 14873 : 278 - 293
  • [5] See-Through-Text Grouping for Referring Image Segmentation
    Chen, Ding-Jie
    Jia, Songhao
    Lo, Yi-Chen
    Chen, Hwann-Tzong
    Liu, Tyng-Luh
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7453 - 7462
  • [6] Shatter and Gather: Learning Referring Image Segmentation with Text Supervision
    Kim, Dongwon
    Kim, Namyup
    Lan, Cuiling
    Kwak, Suha
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15501 - 15511
  • [7] Vision-Aware Language Reasoning for Referring Image Segmentation
    Fayou Xu
    Bing Luo
    Chao Zhang
    Li Xu
    Mingxing Pu
    Bo Li
    Neural Processing Letters, 2023, 55 : 11313 - 11331
  • [8] Vision-Aware Language Reasoning for Referring Image Segmentation
    Xu, Fayou
    Luo, Bing
    Zhang, Chao
    Xu, Li
    Pu, Mingxing
    Li, Bo
    NEURAL PROCESSING LETTERS, 2023, 55 (08) : 11313 - 11331
  • [9] LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
    Yang, Zhao
    Wang, Jiaqi
    Tang, Yansong
    Chen, Kai
    Zhao, Hengshuang
    Torr, Philip H. S.
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18134 - 18144
  • [10] Bidirectional Relationship Inferring Network for Referring Image Localization and Segmentation
    Feng, Guang
    Hu, Zhiwei
    Zhang, Lihe
    Sun, Jiayu
    Lu, Huchuan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (05) : 2246 - 2258