Towards Unsupervised Referring Expression Comprehension with Visual Semantic Parsing

被引:1
|
作者
Wang, Yaodong [1 ]
Ji, Zhong [1 ]
Wang, Di [1 ]
Pang, Yanwei [1 ]
Li, Xuelong [2 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
[2] Northwestern Polytech Univ, Sch Artificial Intelligence, OPt & Elect iOPEN, Xian 710072, Peoples R China
基金
中国国家自然科学基金;
关键词
referring expression comprehension; unsupervised learning; visual semantic parsing; RECONSTRUCTION;
D O I
10.1016/j.knosys.2023.111318
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Referring Expression Comprehension (REC) is a task that involves grounding a specific object in an image based on a given referring query in the form of bounding boxes. Existing fully-supervised or weakly-supervised REC methods rely on either manually annotated regional coordinates or query texts. In this paper, we propose an unsupervised training paradigm for the REC task that does not require any manual annotated data. Specifically, we introduce a <bold>V</bold>isual-Semantic-Parsing-based <bold>U</bold>nsupervised <bold>R</bold>eferring <bold>E</bold>xpression <bold>C</bold>omprehension framework (VUREC), which leverages a Visual Semantic Parser (VSP) as its core module to recognize the rich semantic information in images and construct pseudo-region-query pairs as the training supervision for REC. The VSP comprises a Scene Graph Parser (SGP) and a Visual Concept Detector (VCD) that can detect the locations, categories, attributes of objects, and visual relationships among them in images. Furthermore, we present a Referring Expression Reasoning (RER) model that utilizes a Multi-Modal Cascade Attention Decoder (MCAD) for fine-grained multi-modality fusion and regresses the four coordinates of the referential object directly. The experimental results on three benchmark datasets of Refcoco, Refcoco+ and Refcocog demonstrate the effectiveness of our proposed method.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] LUNA: Language as Continuing Anchors for Referring Expression Comprehension
    Liang, Yaoyuan
    Yang, Zhao
    Tang, Yansong
    Fan, Jiashuo
    Li, Ziran
    Wang, Jingang
    Torr, Philip H. S.
    Huang, Shao-Lun
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5174 - 5184
  • [32] MAttNet: Modular Attention Network for Referring Expression Comprehension
    Yu, Licheng
    Lin, Zhe
    Shen, Xiaohui
    Yang, Jimei
    Lu, Xin
    Bansal, Mohit
    Berg, Tamara L.
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1307 - 1315
  • [33] Referring Expression Comprehension Using Language Adaptive Inference
    Su, Wei
    Miao, Peihan
    Dou, Huanzhang
    Fu, Yongjian
    Li, Xi
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 2357 - 2365
  • [34] Referring Expression Comprehension with Multi-Cross Decoder
    Yi, Zhou Zi
    Feng, Fu Xiao
    Ran, Li Xiao
    2024 16TH INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING, ICCAE 2024, 2024, : 588 - 593
  • [35] Decoupling-Cooperative Framework for Referring Expression Comprehension
    Song, Yun-Zhu
    Chen, Yi-Syuan
    Shuai, Hong-Han
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1542 - 1546
  • [36] Knowledge Mining of Scene Text for Referring Expression Comprehension
    Gao, Chenyang
    Yang, Biao
    Yu, Wenwen
    Liu, Yuliang
    Bai, Xiang
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT V, 2024, 14808 : 245 - 262
  • [37] Scene-Text Oriented Referring Expression Comprehension
    Bu, Yuqi
    Li, Liuwu
    Xie, Jiayuan
    Liu, Qiong
    Cai, Yi
    Huang, Qingbao
    Li, Qing
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7208 - 7221
  • [38] Unambiguous Scene Text Segmentation With Referring Expression Comprehension
    Rong, Xuejian
    Yi, Chucai
    Tian, Yingli
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 591 - 601
  • [39] Neural correlates of semantic-driven syntactic parsing in sentence comprehension
    Zhang, Yun
    Taft, Marcus
    Tang, Jiaman
    Li, Le
    NEUROIMAGE, 2024, 289
  • [40] Semantic separator learning and its applications in unsupervised Chinese text parsing
    Wu, Yuming
    Luo, Xiaodong
    Yang, Zhen
    FRONTIERS OF COMPUTER SCIENCE, 2013, 7 (01) : 55 - 68