Referring expression comprehension model with matching detection and linguistic feedback

被引:0
|
作者
Wang, Jianming [1 ,2 ]
Cui, Enjie [3 ]
Liu, Kunliang [1 ]
Sun, Yukuan [3 ]
Liang, Jiayu [1 ]
Yuan, Chunmiao [1 ]
Duan, Xiaojie [3 ]
Jin, Guanghao [1 ,4 ]
Chung, Tae-Sun [5 ]
机构
[1] Tiangong Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China
[2] Tiangong Univ, Tianjin Key Lab Autonomous Intelligence Technol &, Tianjin, Peoples R China
[3] Tiangong Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China
[4] Tiangong Univ, Tianjin Int Joint Res & Dev Ctr Autonomous Intell, Tianjin, Peoples R China
[5] Ajou Univ, Dept Comp Engn, Suwon 16499, South Korea
基金
新加坡国家研究基金会; 中国国家自然科学基金;
关键词
SEGMENTATION; RECOGNITION; FEATURES; TEXTURE;
D O I
10.1049/iet-cvi.2019.0483
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of referring expression comprehension (REC) is to localise an image region of a specific object described by a natural language expression, and all existing REC methods assume that the object described by the referring expression must be located in the given image. However, this assumption is not correct in some real applications. For example, a visually impaired user might tell his robot 'please take the laptop on the table to me'. In fact, the laptop is not on the table anymore. To address this problem, the authors propose a novel REC model to deal with the situation where expression-image mismatching occurs and explain the mismatching by linguistic feedback. The authors' REC model consists of four modules: the expression parsing module, the entity detection module, the relationship detection module, and the matching detection module. They built a data set called NP-RefCOCO+ from RefCOCO+ including both positive samples and negative samples. The positive samples are original expression-image pairs in RefCOCO+. The negative samples are the expression-image pairs in RefCOCO+, whose expressions are replaced. They evaluate the model on NP-RefCOCO+ and the experimental results show the advantages of their method for dealing with the problem of expression-image mismatching.
引用
收藏
页码:625 / 633
页数:9
相关论文
共 50 条
  • [1] Dynamic Graph Attention for Referring Expression Comprehension
    Yang, Sibei
    Li, Guanbin
    Yu, Yizhou
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4643 - 4652
  • [2] Exploring Logical Reasoning for Referring Expression Comprehension
    Cheng, Ying
    Wang, Ruize
    Yu, Jiashuo
    Zhao, Rui-Wei
    Zhang, Yuejie
    Feng, Rui
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5047 - 5055
  • [3] InterREC: An Interpretable Method for Referring Expression Comprehension
    Wang, Wenbin
    Pagnucco, Maurice
    Xu, Chengpei
    Song, Yang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9330 - 9342
  • [4] Image Segmentation With Language Referring Expression and Comprehension
    Sun, Jiaxing
    Li, Yujie
    Cai, Jintong
    Lu, Huimin
    Serikawa, Seiichi
    IEEE SENSORS JOURNAL, 2022, 22 (18) : 17406 - 17413
  • [5] ScanFormer: Referring Expression Comprehension by Iteratively Scanning
    Sul, Wei
    Miao, Peihan
    Doul, Huanzhang
    Li, Xi
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13449 - 13458
  • [6] Referring Expression Generation and Comprehension via Attributes
    Liu, Jingyu
    Wang, Liang
    Yang, Ming-Hsuan
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4866 - 4874
  • [7] Revisiting Counterfactual Problems in Referring Expression Comprehension
    Yu, Zhihan
    Li, Ruifan
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13438 - 13448
  • [8] Correspondence Matters for Video Referring Expression Comprehension
    Cao, Meng
    Jiang, Ji
    Chen, Long
    Zou, Yuexian
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4967 - 4976
  • [9] Inexactly Matched Referring Expression Comprehension With Rationale
    Li, Xiaochuan
    Fan, Baoyu
    Zhang, Runze
    Zhao, Kun
    Guo, Zhenhua
    Zhao, Yaqian
    Li, Rengang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3937 - 3950
  • [10] Referring Expression Comprehension: A Survey of Methods and Datasets
    Qiao, Yanyuan
    Deng, Chaorui
    Wu, Qi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 4426 - 4440