Text-Vision Relationship Alignment for Referring Image Segmentation

被引：0

作者：

Pu, Mingxing ^{[1
]}

Luo, Bing ^{[1
]}

Zhang, Chao ^{[2
]}

Xu, Li ^{[3
]}

Xu, Fayou ^{[1
]}

Kong, Mingming ^{[1
]}

机构：

[1] Xihua Univ, Sch Comp & Software Engn, Chengdu 610039, Peoples R China

[2] Sichuan Police Coll, Key Lab Intelligent Policing, Luzhou 646000, Peoples R China

[3] Xihua Univ, Sch Sci, Chengdu 610039, Peoples R China

来源：

NEURAL PROCESSING LETTERS | 2024年 / 56卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Semantic parsing; Text-vision alignment; Referring image segmentation;

D O I：

10.1007/s11063-024-11487-2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Referring image segmentation aims to segment object in an image based on a referring expression. Its difficulty lies in aligning expression semantics with visual instances. The existing methods based on semantic reasoning are limited by the performance of external syntax parser and do not explicitly explore the relationships between visual instances. This article proposes an end-to-end method for referring image segmentation by aligning 'linguistic relationship' with 'visual relationships'. This method does not rely on external syntax parser for expression parsing. In this paper, the expression is adaptively and structurally parsed into three components: 'subject', 'object', and 'linguistic relationship' by the Semantic Component Parser (SCP) in a learnable manner. Instances Activation Map Module (IAM) locates multiple visual instances based on the subject and object. In addition, the Relationship Based Visual Localization Module (RBVL) firstly enables each instance of the image to learn global knowledge, then decodes the visual relationships between these visual instances, and finally aligns the visual relationships with the linguistic relationships to further accurately locate the target object. The experimental results show that the proposed method improves performance by 4- 9% compared with baseline method on multiple referring image segmentation datasets.

引用

页数：21

共 50 条

[21] SLViT: Scale-Wise Language-Guided Vision Transformer for Referring Image Segmentation
Ouyang, Shuyi
Wang, Hongyi
Xie, Shiao
Niu, Ziwei
Tong, Ruofeng
Chen, Yen-Wei
Lin, Lanfen
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 1294 - 1302
[22] SMVT: Spectrum-Driven Multi-scale Vision Transformer for Referring Image Segmentation
Li, Tianxiao
Chen, Junhong
Huang, Yiheng
Huang, Kesi
Xia, Qiqiang
Asim, Muhammad
Liu, Wenyin
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT VI, ICIC 2024, 2024, 14867 : 193 - 206
[23] Unambiguous Scene Text Segmentation With Referring Expression Comprehension
Rong, Xuejian
Yi, Chucai
Tian, Yingli
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 591 - 601
[24] DIAL: Dense Image-Text ALignment for Weakly Supervised Semantic Segmentation
Jang, Soojin
Yun, Jungmin
Kwon, Junehyoung
Lee, Eunju
Kim, Youngbin
COMPUTER VISION - ECCV 2024, PT LXIX, 2025, 15127 : 248 - 266
[25] Text-Guided Image Manipulation via Generative Adversarial Network With Referring Image Segmentation-Based Guidance
Watanabe, Yuto
Togo, Ren
Maeda, Keisuke
Ogawa, Takahiro
Haseyama, Miki
IEEE ACCESS, 2023, 11 : 42534 - 42545
[26] RRSIS: Referring Remote Sensing Image Segmentation
Yuan, Zhenghang
Mou, Lichao
Hua, Yuansheng
Zhu, Xiao Xiang
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 12
[27] Distillation and Supplementation of Features for Referring Image Segmentation
Tan, Zeyu
Xu, Dahong
Li, Xi
Liu, Hong
IEEE ACCESS, 2024, 12 : 171269 - 171279
[28] Image Segmentation With Language Referring Expression and Comprehension
Sun, Jiaxing
Li, Yujie
Cai, Jintong
Lu, Huimin
Serikawa, Seiichi
IEEE SENSORS JOURNAL, 2022, 22 (18) : 17406 - 17413
[29] Recurrent Multimodal Interaction for Referring Image Segmentation
Liu, Chenxi
Lin, Zhe
Shen, Xiaohui
Yang, Jimei
Lu, Xin
Yuille, Alan
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1280 - 1289
[30] Referring Image Segmentation by Generative Adversarial Learning
Qiu, Shuang
Zhao, Yao
Jiao, Jianbo
Wei, Yunchao
Wei, Shikui
IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (05) : 1333 - 1344

← 1 2 3 4 5 →