Similarity Maps for Self-Training Weakly-Supervised Phrase Grounding

被引:3
|
作者
Shaharabany, Tal [1 ]
Wolf, Lior [1 ]
机构
[1] Tel Aviv Univ, Tel Aviv, Israel
关键词
D O I
10.1109/CVPR52729.2023.00669
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A phrase grounding model receives an input image and a text phrase and outputs a suitable localization map. We present an effective way to refine a phrase ground model by considering self-similarity maps extracted from the latent representation of the model's image encoder. Our main insights are that these maps resemble localization maps and that by combining such maps, one can obtain useful pseudo-labels for performing self-training. Our results surpass, by a large margin, the state of the art in weakly supervised phrase grounding. A similar gap in performance is obtained for a recently proposed downstream task called WWbL, in which only the image is input, without any text. Our code is available at https://github.com/talshaharabany/Similarity-Maps-forSelf-Training-Weakly-Supervised- Phrase-Grounding.
引用
收藏
页码:6925 / 6934
页数:10
相关论文
共 50 条
  • [11] Iterative Proposal Refinement for Weakly-Supervised Video Grounding
    School of Electronic and Computer Engineering, Peking University, China
    不详
    不详
    不详
    Proc IEEE Comput Soc Conf Comput Vision Pattern Recognit, (6524-6534): : 6524 - 6534
  • [12] Inverse Compositional Learning for Weakly-supervised Relation Grounding
    Li, Huan
    Wei, Ping
    Ma, Zeyu
    Zheng, Nanning
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15431 - 15441
  • [13] Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
    Xiao, Fanyi
    Sigal, Leonid
    Lee, Yong Jae
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5253 - 5262
  • [14] Not All Frames Are Equal: Weakly-Supervised Video Grounding with Contextual Similarity and Visual Clustering Losses
    Shi, Jing
    Xu, Jia
    Gong, Boqing
    Xu, Chenliang
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10436 - 10444
  • [15] What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs
    Shaharabany, Tal
    Tewel, Yoad
    Wolf, Lior
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [16] WLST: Weak Labels Guided Self-training for Weakly-supervised Domain Adaptation on 3D Object Detection
    Tsou, Tsung-Lin
    Wu, Tsung-Han
    Hsu, Winston H.
    arXiv, 2023,
  • [17] A Dual Reinforcement Learning Framework for Weakly Supervised Phrase Grounding
    Wang, Zhiyu
    Yang, Chao
    Jiang, Bin
    Yuan, Junsong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 394 - 405
  • [18] Weakly-Supervised Video Object Grounding via Causal Intervention
    Wang, Wei
    Gao, Junyu
    Xu, Changsheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 3933 - 3948
  • [19] Category-aware self-training for extremely weakly supervised text classification
    Su, Jing
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 269
  • [20] Dynamic updating self-training for semi-weakly supervised object detection
    Zhang, Ming
    Liu, Shuaicheng
    Zeng, Bing
    NEUROCOMPUTING, 2023, 547