Referring expression comprehension model with matching detection and linguistic feedback

被引:0
|
作者
Wang, Jianming [1 ,2 ]
Cui, Enjie [3 ]
Liu, Kunliang [1 ]
Sun, Yukuan [3 ]
Liang, Jiayu [1 ]
Yuan, Chunmiao [1 ]
Duan, Xiaojie [3 ]
Jin, Guanghao [1 ,4 ]
Chung, Tae-Sun [5 ]
机构
[1] Tiangong Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China
[2] Tiangong Univ, Tianjin Key Lab Autonomous Intelligence Technol &, Tianjin, Peoples R China
[3] Tiangong Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China
[4] Tiangong Univ, Tianjin Int Joint Res & Dev Ctr Autonomous Intell, Tianjin, Peoples R China
[5] Ajou Univ, Dept Comp Engn, Suwon 16499, South Korea
基金
新加坡国家研究基金会; 中国国家自然科学基金;
关键词
SEGMENTATION; RECOGNITION; FEATURES; TEXTURE;
D O I
10.1049/iet-cvi.2019.0483
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of referring expression comprehension (REC) is to localise an image region of a specific object described by a natural language expression, and all existing REC methods assume that the object described by the referring expression must be located in the given image. However, this assumption is not correct in some real applications. For example, a visually impaired user might tell his robot 'please take the laptop on the table to me'. In fact, the laptop is not on the table anymore. To address this problem, the authors propose a novel REC model to deal with the situation where expression-image mismatching occurs and explain the mismatching by linguistic feedback. The authors' REC model consists of four modules: the expression parsing module, the entity detection module, the relationship detection module, and the matching detection module. They built a data set called NP-RefCOCO+ from RefCOCO+ including both positive samples and negative samples. The positive samples are original expression-image pairs in RefCOCO+. The negative samples are the expression-image pairs in RefCOCO+, whose expressions are replaced. They evaluate the model on NP-RefCOCO+ and the experimental results show the advantages of their method for dealing with the problem of expression-image mismatching.
引用
收藏
页码:625 / 633
页数:9
相关论文
共 50 条
  • [21] Unambiguous Scene Text Segmentation With Referring Expression Comprehension
    Rong, Xuejian
    Yi, Chucai
    Tian, Yingli
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 591 - 601
  • [22] Building referring expression corpora with and without feedback
    Danillo da Silva Rocha
    Ivandré Paraboni
    Language Resources and Evaluation, 2020, 54 : 875 - 891
  • [23] Building referring expression corpora with and without feedback
    Rocha, Danillo da Silva
    Paraboni, Ivandre
    LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (04) : 875 - 891
  • [24] CSRef: Contrastive Semantic Alignment for Speech Referring Expression Comprehension
    Huang, Lihong
    Zhong, Sheng-Hua
    PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON METHODOLOGIES FOR MULTIMEDIA 2024, MEET4MM 2024, 2024, : 28 - 34
  • [25] RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension
    Jin, Lei
    Luo, Gen
    Zhou, Yiyi
    Sun, Xiaoshuai
    Jiang, Guannan
    Shu, Annan
    Ji, Rongrong
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2681 - 2690
  • [26] Bottom-Up and Bidirectional Alignment for Referring Expression Comprehension
    Li, Liuwu
    Bu, Yuqi
    Cai, Yi
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5167 - 5175
  • [27] Attribute-Guided Attention for Referring Expression Generation and Comprehension
    Liu, Jingyu
    Wang, Wei
    Wang, Liang
    Yang, Ming-Hsuan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 5244 - 5258
  • [28] RETR: END-TO-END REFERRING EXPRESSION COMPREHENSION WITH TRANSFORMERS
    Rui, Yang
    2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2022,
  • [29] Referring Expression Comprehension with Semantic Visual Relationship and Word Mapping
    Zhang, Chao
    Li, Weiming
    Ouyang, Wanli
    Wang, Qiang
    Kim, Woo-Shik
    Hong, Sunghoon
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1258 - 1266
  • [30] Towards Unsupervised Referring Expression Comprehension with Visual Semantic Parsing
    Wang, Yaodong
    Ji, Zhong
    Wang, Di
    Pang, Yanwei
    Li, Xuelong
    KNOWLEDGE-BASED SYSTEMS, 2024, 285