Referring expression comprehension model with matching detection and linguistic feedback

被引:0
|
作者
Wang, Jianming [1 ,2 ]
Cui, Enjie [3 ]
Liu, Kunliang [1 ]
Sun, Yukuan [3 ]
Liang, Jiayu [1 ]
Yuan, Chunmiao [1 ]
Duan, Xiaojie [3 ]
Jin, Guanghao [1 ,4 ]
Chung, Tae-Sun [5 ]
机构
[1] Tiangong Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China
[2] Tiangong Univ, Tianjin Key Lab Autonomous Intelligence Technol &, Tianjin, Peoples R China
[3] Tiangong Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China
[4] Tiangong Univ, Tianjin Int Joint Res & Dev Ctr Autonomous Intell, Tianjin, Peoples R China
[5] Ajou Univ, Dept Comp Engn, Suwon 16499, South Korea
基金
新加坡国家研究基金会; 中国国家自然科学基金;
关键词
SEGMENTATION; RECOGNITION; FEATURES; TEXTURE;
D O I
10.1049/iet-cvi.2019.0483
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of referring expression comprehension (REC) is to localise an image region of a specific object described by a natural language expression, and all existing REC methods assume that the object described by the referring expression must be located in the given image. However, this assumption is not correct in some real applications. For example, a visually impaired user might tell his robot 'please take the laptop on the table to me'. In fact, the laptop is not on the table anymore. To address this problem, the authors propose a novel REC model to deal with the situation where expression-image mismatching occurs and explain the mismatching by linguistic feedback. The authors' REC model consists of four modules: the expression parsing module, the entity detection module, the relationship detection module, and the matching detection module. They built a data set called NP-RefCOCO+ from RefCOCO+ including both positive samples and negative samples. The positive samples are original expression-image pairs in RefCOCO+. The negative samples are the expression-image pairs in RefCOCO+, whose expressions are replaced. They evaluate the model on NP-RefCOCO+ and the experimental results show the advantages of their method for dealing with the problem of expression-image mismatching.
引用
收藏
页码:625 / 633
页数:9
相关论文
共 50 条
  • [31] Continual Referring Expression Comprehension via Dual Modular Memorization
    Shen, Heng Tao
    Chen, Cheng
    Wang, Peng
    Gao, Lianli
    Wang, Meng
    Song, Jingkuan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6694 - 6706
  • [32] Adaptive knowledge distillation and integration for weakly supervised referring expression comprehension
    Mi, Jinpeng
    Wermter, Stefan
    Zhang, Jianwei
    KNOWLEDGE-BASED SYSTEMS, 2024, 286
  • [33] RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension
    Sun, Jiamu
    Luo, Gen
    Zhou, Yiyi
    Sun, Xiaoshuai
    Jiang, Guannan
    Wang, Zhiyu
    Ji, Rongrong
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19144 - 19154
  • [34] Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge
    Wang, Peng
    Liu, Dongyang
    Li, Hui
    Wu, Qi
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 28 - 36
  • [35] Referring Expression Comprehension via Co-attention and Visual Context
    Gao, Youming
    Ji, Yi
    Xu, Ting
    Xu, Yunlong
    Liu, Chunping
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: IMAGE PROCESSING, PT III, 2019, 11729 : 119 - 130
  • [36] MUTATT: VISUAL-TEXTUAL MUTUAL GUIDANCE FOR REFERRING EXPRESSION COMPREHENSION
    Wang, Shuai
    Lyu, Fan
    Feng, Wei
    Wang, Song
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [37] Referring Expression Comprehension by Composing Semantic-based Visual Attention
    Zhu, Zheng-An
    Chiang, Hsuan-Lun
    Chiang, Chen-Kuo
    2022 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN, IEEE ICCE-TW 2022, 2022, : 345 - 346
  • [38] Multiple Relational Learning Network for Joint Referring Expression Comprehension and Segmentation
    Hua, Guoguang
    Liao, Muxin
    Tian, Shishun
    Zhang, Yuhang
    Zou, Wenbin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8805 - 8816
  • [39] Cross-modality synergy network for referring expression comprehension and segmentation
    Li, Qianzhong
    Zhang, Yujia
    Sun, Shiying
    Wu, Jinting
    Zhao, Xiaoguang
    Tan, Min
    NEUROCOMPUTING, 2022, 467 : 99 - 114
  • [40] Scene Graph Enhanced Pseudo-Labeling for Referring Expression Comprehension
    Wu, Cantao
    Cai, Yi
    Li, Liuwu
    Wang, Jiexin
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11978 - 11990