Text-Vision Relationship Alignment for Referring Image Segmentation

被引：0

作者：

Pu, Mingxing ^{[1
]}

Luo, Bing ^{[1
]}

Zhang, Chao ^{[2
]}

Xu, Li ^{[3
]}

Xu, Fayou ^{[1
]}

Kong, Mingming ^{[1
]}

机构：

[1] Xihua Univ, Sch Comp & Software Engn, Chengdu 610039, Peoples R China

[2] Sichuan Police Coll, Key Lab Intelligent Policing, Luzhou 646000, Peoples R China

[3] Xihua Univ, Sch Sci, Chengdu 610039, Peoples R China

来源：

NEURAL PROCESSING LETTERS | 2024年 / 56卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Semantic parsing; Text-vision alignment; Referring image segmentation;

D O I：

10.1007/s11063-024-11487-2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Referring image segmentation aims to segment object in an image based on a referring expression. Its difficulty lies in aligning expression semantics with visual instances. The existing methods based on semantic reasoning are limited by the performance of external syntax parser and do not explicitly explore the relationships between visual instances. This article proposes an end-to-end method for referring image segmentation by aligning 'linguistic relationship' with 'visual relationships'. This method does not rely on external syntax parser for expression parsing. In this paper, the expression is adaptively and structurally parsed into three components: 'subject', 'object', and 'linguistic relationship' by the Semantic Component Parser (SCP) in a learnable manner. Instances Activation Map Module (IAM) locates multiple visual instances based on the subject and object. In addition, the Relationship Based Visual Localization Module (RBVL) firstly enables each instance of the image to learn global knowledge, then decodes the visual relationships between these visual instances, and finally aligns the visual relationships with the linguistic relationships to further accurately locate the target object. The experimental results show that the proposed method improves performance by 4- 9% compared with baseline method on multiple referring image segmentation datasets.

引用

页数：21

共 50 条

[31] Contrastive Grouping with Transformer for Referring Image Segmentation
Tang, Jiajin
Zheng, Ge
Shi, Cheng
Yang, Sibei
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23570 - 23580
[32] ReMamber: Referring Image Segmentation with Mamba Twister
Yang, Yuhuan
Ma, Chaofan
Yao, Jiangchao
Zhong, Zhun
Zhang, Ya
Wang, Yanfeng
COMPUTER VISION - ECCV 2024, PT X, 2025, 15068 : 108 - 126
[33] Structured Attention Network for Referring Image Segmentation
Lin, Liang
Yan, Pengxiang
Xu, Xiaoqian
Yang, Sibei
Zeng, Kun
Li, Guanbin
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1922 - 1932
[34] REFERRING IMAGE SEGMENTATION FOR REMOTE SENSING DATA
Yuan, Zhenghang
Mou, Lichao
Hua, Yuansheng
Zhu, Xiao Xiang
IGARSS 2024-2024 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, IGARSS 2024, 2024, : 946 - 949
[35] Fuse and Calibrate: A Bi-directional Vision-Language Guided Framework for Referring Image Segmentation
Yan, Yichen
He, Xingjian
Chen, Sihan
Lu, Shichen
Liu, Jing
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XI, ICIC 2024, 2024, 14872 : 313 - 324
[36] Vision-Language Transformer and Query Generation for Referring Segmentation
Ding, Henghui
Liu, Chang
Wang, Suchen
Jiang, Xudong
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 16301 - 16310
[37] An Improved Adaptive Genetic Algorithm for Image Segmentation and Vision Alignment Used in Microelectronic Bonding
Wang, Fujun
Li, Junlan
Liu, Shiwei
Zhao, Xingyu
Zhang, Dawei
Tian, Yanling
IEEE-ASME TRANSACTIONS ON MECHATRONICS, 2014, 19 (03) : 916 - 923
[38] PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
Liu, Jiang
Ding, Hui
Cai, Zhaowei
Zhang, Yuting
Satzoda, Ravi Kumar
Mahadevan, Vijay
Manmatha, R.
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18653 - 18663
[39] CRIS: CLIP-Driven Referring Image Segmentation
Wang, Zhaoqing
Lu, Yu
Li, Qiang
Tao, Xunqiang
Guo, Yandong
Gong, Mingming
Liu, Tongliang
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11676 - 11685
[40] Attentive Excitation and Aggregation for Bilingual Referring Image Segmentation
Zhou, Qianli
Hui, Tianrui
Wang, Rong
Hu, Haimiao
Liu, Si
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2021, 12 (02)

← 1 2 3 4 5 →