Referring expression comprehension model with matching detection and linguistic feedback

被引:0
|
作者
Wang, Jianming [1 ,2 ]
Cui, Enjie [3 ]
Liu, Kunliang [1 ]
Sun, Yukuan [3 ]
Liang, Jiayu [1 ]
Yuan, Chunmiao [1 ]
Duan, Xiaojie [3 ]
Jin, Guanghao [1 ,4 ]
Chung, Tae-Sun [5 ]
机构
[1] Tiangong Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China
[2] Tiangong Univ, Tianjin Key Lab Autonomous Intelligence Technol &, Tianjin, Peoples R China
[3] Tiangong Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China
[4] Tiangong Univ, Tianjin Int Joint Res & Dev Ctr Autonomous Intell, Tianjin, Peoples R China
[5] Ajou Univ, Dept Comp Engn, Suwon 16499, South Korea
基金
新加坡国家研究基金会; 中国国家自然科学基金;
关键词
SEGMENTATION; RECOGNITION; FEATURES; TEXTURE;
D O I
10.1049/iet-cvi.2019.0483
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of referring expression comprehension (REC) is to localise an image region of a specific object described by a natural language expression, and all existing REC methods assume that the object described by the referring expression must be located in the given image. However, this assumption is not correct in some real applications. For example, a visually impaired user might tell his robot 'please take the laptop on the table to me'. In fact, the laptop is not on the table anymore. To address this problem, the authors propose a novel REC model to deal with the situation where expression-image mismatching occurs and explain the mismatching by linguistic feedback. The authors' REC model consists of four modules: the expression parsing module, the entity detection module, the relationship detection module, and the matching detection module. They built a data set called NP-RefCOCO+ from RefCOCO+ including both positive samples and negative samples. The positive samples are original expression-image pairs in RefCOCO+. The negative samples are the expression-image pairs in RefCOCO+, whose expressions are replaced. They evaluate the model on NP-RefCOCO+ and the experimental results show the advantages of their method for dealing with the problem of expression-image mismatching.
引用
收藏
页码:625 / 633
页数:9
相关论文
共 50 条
  • [41] Selective Comprehension for Referring Expression by Prebuilt Entity Dictionary with Modular Networks
    Cui, Enjie
    Wang, Jianming
    Liang, Jiayu
    Jin, Guanghao
    KNOWLEDGE MANAGEMENT AND ACQUISITION FOR INTELLIGENT SYSTEMS (PKAW 2018), 2018, 11016 : 211 - 220
  • [42] ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension
    Subramanian, Sanjay
    Merrill, Will
    Darrell, Trevor
    Gardner, Matt
    Singh, Sameer
    Rohrbach, Anna
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 5198 - 5215
  • [43] Bridging the Gap between Expression and Scene Text for Referring Expression Comprehension (Student Abstract)
    Bu, Yuqi
    Xie, Jiayuan
    Li, Liuwu
    Liu, Qiong
    Cai, Yi
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 12921 - 12922
  • [44] Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding
    Sun, Mingjie
    Xiao, Jimin
    Lim, Eng Gee
    Liu, Si
    Goulermas, John Y.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (11) : 4189 - 4195
  • [45] One for all: One-stage referring expression comprehension with dynamic reasoning
    Zhang, Zhipeng
    Wei, Zhimin
    Huang, Zhongzhen
    Niu, Rui
    Wang, Peng
    NEUROCOMPUTING, 2023, 518 : 523 - 532
  • [46] Rethinking and Improving Feature Pyramids for One-Stage Referring Expression Comprehension
    Suo, Wei
    Sun, Mengyang
    Wang, Peng
    Zhang, Yanning
    Wu, Qi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 854 - 864
  • [47] CLEVR-Implicit: A Diagnostic Dataset for Implicit Reasoning in Referring Expression Comprehension
    Zhang, Jingwei
    Wu, Xin
    Cai, Yi
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 12820 - 12830
  • [48] Video Referring Expression Comprehension via Transformer with Content-conditioned Query
    Jiang, Ji
    Cao, Meng
    Song, Tengtao
    Chen, Long
    Wang, Yi
    Zou, Yuexian
    PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON DEEP MULTIMODAL LEARNING FOR INFORMATION RETRIEVAL, MMIR 2023, 2023, : 39 - 48
  • [49] Language-Conditioned Region Proposal and Retrieval Network for Referring Expression Comprehension
    Xie, Yanwei
    Liu, Daqing
    Chen, Xuejin
    Zha, Zheng-Jun
    MMPT '21: PROCEEDINGS OF THE 2021 WORKSHOP ON MULTI-MODAL PRE-TRAINING FOR MULTIMEDIA UNDERSTANDING, 2021, : 14 - 22
  • [50] LGR-NET: Language Guided Reasoning Network for Referring Expression Comprehension
    Lu, Mingcong
    Li, Ruifan
    Feng, Fangxiang
    Ma, Zhanyu
    Wang, Xiaojie
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7771 - 7784