Referring expression comprehension model with matching detection and linguistic feedback

被引：0

作者：

Wang, Jianming ^{[1
,2
]}

Cui, Enjie ^{[3
]}

Liu, Kunliang ^{[1
]}

Sun, Yukuan ^{[3
]}

Liang, Jiayu ^{[1
]}

Yuan, Chunmiao ^{[1
]}

Duan, Xiaojie ^{[3
]}

Jin, Guanghao ^{[1
,4
]}

Chung, Tae-Sun ^{[5
]}

机构：

[1] Tiangong Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China

[2] Tiangong Univ, Tianjin Key Lab Autonomous Intelligence Technol &, Tianjin, Peoples R China

[3] Tiangong Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China

[4] Tiangong Univ, Tianjin Int Joint Res & Dev Ctr Autonomous Intell, Tianjin, Peoples R China

[5] Ajou Univ, Dept Comp Engn, Suwon 16499, South Korea

来源：

IET COMPUTER VISION | 2020年 / 14卷 / 08期

基金：

新加坡国家研究基金会; 中国国家自然科学基金;

关键词：

SEGMENTATION; RECOGNITION; FEATURES; TEXTURE;

D O I：

10.1049/iet-cvi.2019.0483

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The task of referring expression comprehension (REC) is to localise an image region of a specific object described by a natural language expression, and all existing REC methods assume that the object described by the referring expression must be located in the given image. However, this assumption is not correct in some real applications. For example, a visually impaired user might tell his robot 'please take the laptop on the table to me'. In fact, the laptop is not on the table anymore. To address this problem, the authors propose a novel REC model to deal with the situation where expression-image mismatching occurs and explain the mismatching by linguistic feedback. The authors' REC model consists of four modules: the expression parsing module, the entity detection module, the relationship detection module, and the matching detection module. They built a data set called NP-RefCOCO+ from RefCOCO+ including both positive samples and negative samples. The positive samples are original expression-image pairs in RefCOCO+. The negative samples are the expression-image pairs in RefCOCO+, whose expressions are replaced. They evaluate the model on NP-RefCOCO+ and the experimental results show the advantages of their method for dealing with the problem of expression-image mismatching.

引用

页码：625 / 633

页数：9

共 50 条

[21] Unambiguous Scene Text Segmentation With Referring Expression Comprehension
Rong, Xuejian
Yi, Chucai
Tian, Yingli
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 591 - 601
[22] Building referring expression corpora with and without feedback
Danillo da Silva Rocha
Ivandré Paraboni
Language Resources and Evaluation, 2020, 54 : 875 - 891
[23] Building referring expression corpora with and without feedback
Rocha, Danillo da Silva
Paraboni, Ivandre
LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (04) : 875 - 891
[24] CSRef: Contrastive Semantic Alignment for Speech Referring Expression Comprehension
Huang, Lihong
Zhong, Sheng-Hua
PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON METHODOLOGIES FOR MULTIMEDIA 2024, MEET4MM 2024, 2024, : 28 - 34
[25] RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension
Jin, Lei
Luo, Gen
Zhou, Yiyi
Sun, Xiaoshuai
Jiang, Guannan
Shu, Annan
Ji, Rongrong
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2681 - 2690
[26] Bottom-Up and Bidirectional Alignment for Referring Expression Comprehension
Li, Liuwu
Bu, Yuqi
Cai, Yi
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5167 - 5175
[27] Attribute-Guided Attention for Referring Expression Generation and Comprehension
Liu, Jingyu
Wang, Wei
Wang, Liang
Yang, Ming-Hsuan
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 5244 - 5258
[28] RETR: END-TO-END REFERRING EXPRESSION COMPREHENSION WITH TRANSFORMERS
Rui, Yang
2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2022,
[29] Referring Expression Comprehension with Semantic Visual Relationship and Word Mapping
Zhang, Chao
Li, Weiming
Ouyang, Wanli
Wang, Qiang
Kim, Woo-Shik
Hong, Sunghoon
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1258 - 1266
[30] Towards Unsupervised Referring Expression Comprehension with Visual Semantic Parsing
Wang, Yaodong
Ji, Zhong
Wang, Di
Pang, Yanwei
Li, Xuelong
KNOWLEDGE-BASED SYSTEMS, 2024, 285

← 1 2 3 4 5 →