Referring expression comprehension model with matching detection and linguistic feedback

被引：0

作者：

Wang, Jianming ^{[1
,2
]}

Cui, Enjie ^{[3
]}

Liu, Kunliang ^{[1
]}

Sun, Yukuan ^{[3
]}

Liang, Jiayu ^{[1
]}

Yuan, Chunmiao ^{[1
]}

Duan, Xiaojie ^{[3
]}

Jin, Guanghao ^{[1
,4
]}

Chung, Tae-Sun ^{[5
]}

机构：

[1] Tiangong Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China

[2] Tiangong Univ, Tianjin Key Lab Autonomous Intelligence Technol &, Tianjin, Peoples R China

[3] Tiangong Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China

[4] Tiangong Univ, Tianjin Int Joint Res & Dev Ctr Autonomous Intell, Tianjin, Peoples R China

[5] Ajou Univ, Dept Comp Engn, Suwon 16499, South Korea

来源：

IET COMPUTER VISION | 2020年 / 14卷 / 08期

基金：

新加坡国家研究基金会; 中国国家自然科学基金;

关键词：

SEGMENTATION; RECOGNITION; FEATURES; TEXTURE;

D O I：

10.1049/iet-cvi.2019.0483

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The task of referring expression comprehension (REC) is to localise an image region of a specific object described by a natural language expression, and all existing REC methods assume that the object described by the referring expression must be located in the given image. However, this assumption is not correct in some real applications. For example, a visually impaired user might tell his robot 'please take the laptop on the table to me'. In fact, the laptop is not on the table anymore. To address this problem, the authors propose a novel REC model to deal with the situation where expression-image mismatching occurs and explain the mismatching by linguistic feedback. The authors' REC model consists of four modules: the expression parsing module, the entity detection module, the relationship detection module, and the matching detection module. They built a data set called NP-RefCOCO+ from RefCOCO+ including both positive samples and negative samples. The positive samples are original expression-image pairs in RefCOCO+. The negative samples are the expression-image pairs in RefCOCO+, whose expressions are replaced. They evaluate the model on NP-RefCOCO+ and the experimental results show the advantages of their method for dealing with the problem of expression-image mismatching.

引用

页码：625 / 633

页数：9

共 50 条

[31] Continual Referring Expression Comprehension via Dual Modular Memorization
Shen, Heng Tao
Chen, Cheng
Wang, Peng
Gao, Lianli
Wang, Meng
Song, Jingkuan
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6694 - 6706
[32] Adaptive knowledge distillation and integration for weakly supervised referring expression comprehension
Mi, Jinpeng
Wermter, Stefan
Zhang, Jianwei
KNOWLEDGE-BASED SYSTEMS, 2024, 286
[33] RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension
Sun, Jiamu
Luo, Gen
Zhou, Yiyi
Sun, Xiaoshuai
Jiang, Guannan
Wang, Zhiyu
Ji, Rongrong
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19144 - 19154
[34] Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge
Wang, Peng
Liu, Dongyang
Li, Hui
Wu, Qi
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 28 - 36
[35] Referring Expression Comprehension via Co-attention and Visual Context
Gao, Youming
Ji, Yi
Xu, Ting
Xu, Yunlong
Liu, Chunping
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: IMAGE PROCESSING, PT III, 2019, 11729 : 119 - 130
[36] MUTATT: VISUAL-TEXTUAL MUTUAL GUIDANCE FOR REFERRING EXPRESSION COMPREHENSION
Wang, Shuai
Lyu, Fan
Feng, Wei
Wang, Song
2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
[37] Referring Expression Comprehension by Composing Semantic-based Visual Attention
Zhu, Zheng-An
Chiang, Hsuan-Lun
Chiang, Chen-Kuo
2022 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN, IEEE ICCE-TW 2022, 2022, : 345 - 346
[38] Multiple Relational Learning Network for Joint Referring Expression Comprehension and Segmentation
Hua, Guoguang
Liao, Muxin
Tian, Shishun
Zhang, Yuhang
Zou, Wenbin
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8805 - 8816
[39] Cross-modality synergy network for referring expression comprehension and segmentation
Li, Qianzhong
Zhang, Yujia
Sun, Shiying
Wu, Jinting
Zhao, Xiaoguang
Tan, Min
NEUROCOMPUTING, 2022, 467 : 99 - 114
[40] Scene Graph Enhanced Pseudo-Labeling for Referring Expression Comprehension
Wu, Cantao
Cai, Yi
Li, Liuwu
Wang, Jiexin
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11978 - 11990

← 1 2 3 4 5 →