Towards Unsupervised Referring Expression Comprehension with Visual Semantic Parsing

被引:1
|
作者
Wang, Yaodong [1 ]
Ji, Zhong [1 ]
Wang, Di [1 ]
Pang, Yanwei [1 ]
Li, Xuelong [2 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
[2] Northwestern Polytech Univ, Sch Artificial Intelligence, OPt & Elect iOPEN, Xian 710072, Peoples R China
基金
中国国家自然科学基金;
关键词
referring expression comprehension; unsupervised learning; visual semantic parsing; RECONSTRUCTION;
D O I
10.1016/j.knosys.2023.111318
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Referring Expression Comprehension (REC) is a task that involves grounding a specific object in an image based on a given referring query in the form of bounding boxes. Existing fully-supervised or weakly-supervised REC methods rely on either manually annotated regional coordinates or query texts. In this paper, we propose an unsupervised training paradigm for the REC task that does not require any manual annotated data. Specifically, we introduce a <bold>V</bold>isual-Semantic-Parsing-based <bold>U</bold>nsupervised <bold>R</bold>eferring <bold>E</bold>xpression <bold>C</bold>omprehension framework (VUREC), which leverages a Visual Semantic Parser (VSP) as its core module to recognize the rich semantic information in images and construct pseudo-region-query pairs as the training supervision for REC. The VSP comprises a Scene Graph Parser (SGP) and a Visual Concept Detector (VCD) that can detect the locations, categories, attributes of objects, and visual relationships among them in images. Furthermore, we present a Referring Expression Reasoning (RER) model that utilizes a Multi-Modal Cascade Attention Decoder (MCAD) for fine-grained multi-modality fusion and regresses the four coordinates of the referential object directly. The experimental results on three benchmark datasets of Refcoco, Refcoco+ and Refcocog demonstrate the effectiveness of our proposed method.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] ScanFormer: Referring Expression Comprehension by Iteratively Scanning
    Sul, Wei
    Miao, Peihan
    Doul, Huanzhang
    Li, Xi
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13449 - 13458
  • [22] Referring Expression Generation and Comprehension via Attributes
    Liu, Jingyu
    Wang, Liang
    Yang, Ming-Hsuan
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4866 - 4874
  • [23] Revisiting Counterfactual Problems in Referring Expression Comprehension
    Yu, Zhihan
    Li, Ruifan
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13438 - 13448
  • [24] Correspondence Matters for Video Referring Expression Comprehension
    Cao, Meng
    Jiang, Ji
    Chen, Long
    Zou, Yuexian
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4967 - 4976
  • [25] Inexactly Matched Referring Expression Comprehension With Rationale
    Li, Xiaochuan
    Fan, Baoyu
    Zhang, Runze
    Zhao, Kun
    Guo, Zhenhua
    Zhao, Yaqian
    Li, Rengang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3937 - 3950
  • [26] Referring Expression Comprehension: A Survey of Methods and Datasets
    Qiao, Yanyuan
    Deng, Chaorui
    Wu, Qi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 4426 - 4440
  • [27] Relationship Aggregation Network for Referring Expression Comprehension
    Guo W.
    Zhang Y.
    Liu S.
    Yang J.
    Yuan X.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (11): : 2611 - 2623
  • [28] Progressively diffused networks for semantic visual parsing
    Zhang, Ruimao
    Yang, Wei
    Peng, Zhanglin
    Wei, Pengxu
    Wang, Xiaogang
    Lin, Liang
    PATTERN RECOGNITION, 2019, 90 : 78 - 86
  • [29] Cross-Modal Recurrent Semantic Comprehension for Referring Image Segmentation
    Shang, Chao
    Li, Hongliang
    Qiu, Heqian
    Wu, Qingbo
    Meng, Fanman
    Zhao, Taijin
    Ngan, King Ngi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (07) : 3229 - 3242
  • [30] Multi-level attention for referring expression comprehension
    Sun, Yanfeng
    Zhang, Yunru
    Jiang, Huajie
    Hu, Yongli
    Yin, Baocai
    PATTERN RECOGNITION LETTERS, 2023, 172 : 252 - 258