Referring expression comprehension model with matching detection and linguistic feedback

被引：0

作者：

Wang, Jianming ^{[1
,2
]}

Cui, Enjie ^{[3
]}

Liu, Kunliang ^{[1
]}

Sun, Yukuan ^{[3
]}

Liang, Jiayu ^{[1
]}

Yuan, Chunmiao ^{[1
]}

Duan, Xiaojie ^{[3
]}

Jin, Guanghao ^{[1
,4
]}

Chung, Tae-Sun ^{[5
]}

机构：

[1] Tiangong Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China

[2] Tiangong Univ, Tianjin Key Lab Autonomous Intelligence Technol &, Tianjin, Peoples R China

[3] Tiangong Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China

[4] Tiangong Univ, Tianjin Int Joint Res & Dev Ctr Autonomous Intell, Tianjin, Peoples R China

[5] Ajou Univ, Dept Comp Engn, Suwon 16499, South Korea

来源：

IET COMPUTER VISION | 2020年 / 14卷 / 08期

基金：

新加坡国家研究基金会; 中国国家自然科学基金;

关键词：

SEGMENTATION; RECOGNITION; FEATURES; TEXTURE;

D O I：

10.1049/iet-cvi.2019.0483

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The task of referring expression comprehension (REC) is to localise an image region of a specific object described by a natural language expression, and all existing REC methods assume that the object described by the referring expression must be located in the given image. However, this assumption is not correct in some real applications. For example, a visually impaired user might tell his robot 'please take the laptop on the table to me'. In fact, the laptop is not on the table anymore. To address this problem, the authors propose a novel REC model to deal with the situation where expression-image mismatching occurs and explain the mismatching by linguistic feedback. The authors' REC model consists of four modules: the expression parsing module, the entity detection module, the relationship detection module, and the matching detection module. They built a data set called NP-RefCOCO+ from RefCOCO+ including both positive samples and negative samples. The positive samples are original expression-image pairs in RefCOCO+. The negative samples are the expression-image pairs in RefCOCO+, whose expressions are replaced. They evaluate the model on NP-RefCOCO+ and the experimental results show the advantages of their method for dealing with the problem of expression-image mismatching.

引用

页码：625 / 633

页数：9

共 50 条

[1] Dynamic Graph Attention for Referring Expression Comprehension
Yang, Sibei
Li, Guanbin
Yu, Yizhou
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4643 - 4652
[2] Exploring Logical Reasoning for Referring Expression Comprehension
Cheng, Ying
Wang, Ruize
Yu, Jiashuo
Zhao, Rui-Wei
Zhang, Yuejie
Feng, Rui
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5047 - 5055
[3] InterREC: An Interpretable Method for Referring Expression Comprehension
Wang, Wenbin
Pagnucco, Maurice
Xu, Chengpei
Song, Yang
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9330 - 9342
[4] Image Segmentation With Language Referring Expression and Comprehension
Sun, Jiaxing
Li, Yujie
Cai, Jintong
Lu, Huimin
Serikawa, Seiichi
IEEE SENSORS JOURNAL, 2022, 22 (18) : 17406 - 17413
[5] ScanFormer: Referring Expression Comprehension by Iteratively Scanning
Sul, Wei
Miao, Peihan
Doul, Huanzhang
Li, Xi
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13449 - 13458
[6] Referring Expression Generation and Comprehension via Attributes
Liu, Jingyu
Wang, Liang
Yang, Ming-Hsuan
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4866 - 4874
[7] Revisiting Counterfactual Problems in Referring Expression Comprehension
Yu, Zhihan
Li, Ruifan
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13438 - 13448
[8] Correspondence Matters for Video Referring Expression Comprehension
Cao, Meng
Jiang, Ji
Chen, Long
Zou, Yuexian
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4967 - 4976
[9] Inexactly Matched Referring Expression Comprehension With Rationale
Li, Xiaochuan
Fan, Baoyu
Zhang, Runze
Zhao, Kun
Guo, Zhenhua
Zhao, Yaqian
Li, Rengang
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3937 - 3950
[10] Referring Expression Comprehension: A Survey of Methods and Datasets
Qiao, Yanyuan
Deng, Chaorui
Wu, Qi
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 4426 - 4440

← 1 2 3 4 5 →