Towards Unsupervised Referring Expression Comprehension with Visual Semantic Parsing

被引:1
|
作者
Wang, Yaodong [1 ]
Ji, Zhong [1 ]
Wang, Di [1 ]
Pang, Yanwei [1 ]
Li, Xuelong [2 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
[2] Northwestern Polytech Univ, Sch Artificial Intelligence, OPt & Elect iOPEN, Xian 710072, Peoples R China
基金
中国国家自然科学基金;
关键词
referring expression comprehension; unsupervised learning; visual semantic parsing; RECONSTRUCTION;
D O I
10.1016/j.knosys.2023.111318
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Referring Expression Comprehension (REC) is a task that involves grounding a specific object in an image based on a given referring query in the form of bounding boxes. Existing fully-supervised or weakly-supervised REC methods rely on either manually annotated regional coordinates or query texts. In this paper, we propose an unsupervised training paradigm for the REC task that does not require any manual annotated data. Specifically, we introduce a <bold>V</bold>isual-Semantic-Parsing-based <bold>U</bold>nsupervised <bold>R</bold>eferring <bold>E</bold>xpression <bold>C</bold>omprehension framework (VUREC), which leverages a Visual Semantic Parser (VSP) as its core module to recognize the rich semantic information in images and construct pseudo-region-query pairs as the training supervision for REC. The VSP comprises a Scene Graph Parser (SGP) and a Visual Concept Detector (VCD) that can detect the locations, categories, attributes of objects, and visual relationships among them in images. Furthermore, we present a Referring Expression Reasoning (RER) model that utilizes a Multi-Modal Cascade Attention Decoder (MCAD) for fine-grained multi-modality fusion and regresses the four coordinates of the referential object directly. The experimental results on three benchmark datasets of Refcoco, Refcoco+ and Refcocog demonstrate the effectiveness of our proposed method.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Leveraging Knowledge Graphs for Web-Scale Unsupervised Semantic Parsing
    Heck, Larry
    Hakkani-Tur, Dilek
    Tur, Gokhan
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1593 - 1597
  • [42] Semantic separator learning and its applications in unsupervised Chinese text parsing
    Yuming Wu
    Xiaodong Luo
    Zhen Yang
    Frontiers of Computer Science, 2013, 7 : 55 - 68
  • [43] Towards Comparability of Linguistic Graph Banks for Semantic Parsing
    Oepen, Stephan
    Kuhlmann, Marco
    Miyao, Yusuke
    Zeman, Daniel
    Cinkova, Silvie
    Flickinger, Dan
    Hajic, Jan
    Ivanova, Angelina
    Uresova, Zdenka
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3991 - 3995
  • [44] RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension
    Jin, Lei
    Luo, Gen
    Zhou, Yiyi
    Sun, Xiaoshuai
    Jiang, Guannan
    Shu, Annan
    Ji, Rongrong
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2681 - 2690
  • [45] Bottom-Up and Bidirectional Alignment for Referring Expression Comprehension
    Li, Liuwu
    Bu, Yuqi
    Cai, Yi
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5167 - 5175
  • [46] Attribute-Guided Attention for Referring Expression Generation and Comprehension
    Liu, Jingyu
    Wang, Wei
    Wang, Liang
    Yang, Ming-Hsuan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 5244 - 5258
  • [47] RETR: END-TO-END REFERRING EXPRESSION COMPREHENSION WITH TRANSFORMERS
    Rui, Yang
    2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2022,
  • [48] Continual Referring Expression Comprehension via Dual Modular Memorization
    Shen, Heng Tao
    Chen, Cheng
    Wang, Peng
    Gao, Lianli
    Wang, Meng
    Song, Jingkuan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6694 - 6706
  • [49] Referring expression comprehension model with matching detection and linguistic feedback
    Wang, Jianming
    Cui, Enjie
    Liu, Kunliang
    Sun, Yukuan
    Liang, Jiayu
    Yuan, Chunmiao
    Duan, Xiaojie
    Jin, Guanghao
    Chung, Tae-Sun
    IET COMPUTER VISION, 2020, 14 (08) : 625 - 633
  • [50] Towards Unsupervised Open World Semantic Segmentation
    Uhlemeyer, Svenja
    Rottmann, Matthias
    Gottschalk, Hanno
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 180, 2022, 180 : 1981 - 1991