Similarity Maps for Self-Training Weakly-Supervised Phrase Grounding

被引:3
|
作者
Shaharabany, Tal [1 ]
Wolf, Lior [1 ]
机构
[1] Tel Aviv Univ, Tel Aviv, Israel
关键词
D O I
10.1109/CVPR52729.2023.00669
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A phrase grounding model receives an input image and a text phrase and outputs a suitable localization map. We present an effective way to refine a phrase ground model by considering self-similarity maps extracted from the latent representation of the model's image encoder. Our main insights are that these maps resemble localization maps and that by combining such maps, one can obtain useful pseudo-labels for performing self-training. Our results surpass, by a large margin, the state of the art in weakly supervised phrase grounding. A similar gap in performance is obtained for a recently proposed downstream task called WWbL, in which only the image is input, without any text. Our code is available at https://github.com/talshaharabany/Similarity-Maps-forSelf-Training-Weakly-Supervised- Phrase-Grounding.
引用
收藏
页码:6925 / 6934
页数:10
相关论文
共 50 条
  • [21] A weakly supervised approach to Chinese sentiment classification using partitioned self-training
    Zhang, Pu
    He, Zhongshi
    JOURNAL OF INFORMATION SCIENCE, 2013, 39 (06) : 815 - 831
  • [22] SeLa-MIL: Developing an instance-level classifier via weakly-supervised self-training for whole slide image classification
    Ma, Yingfan
    Yuan, Mingzhi
    Shen, Ao
    Luo, Xiaoyuan
    An, Bohan
    Chen, Xinrong
    Wang, Manning
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2025, 261
  • [23] vtGraphNet: Learning weakly-supervised scene graph for complex visual grounding
    Lyu, Fan
    Feng, Wei
    Wang, Song
    NEUROCOMPUTING, 2020, 413 : 51 - 60
  • [24] Weakly-Supervised Spoken Video Grounding via Semantic Interaction Learning
    Wang, Ye
    Lin, Wang
    Zhang, Shengyu
    Jin, Tao
    Li, Linjun
    Cheng, Xize
    Zhao, Zhou
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 10914 - 10932
  • [25] Rethinking Weakly-Supervised Video Temporal Grounding From a Game Perspective
    Fang, Xiang
    Xiong, Zeyu
    Fang, Wanlong
    Qu, Xiaoye
    Chen, Chen
    Dong, Jianfeng
    Tang, Keke
    Zhou, Pan
    Cheng, Yu
    Liu, Daizong
    COMPUTER VISION - ECCV 2024, PT XLV, 2025, 15103 : 290 - 311
  • [26] AsyNCE: Disentangling False-Positives for Weakly-Supervised Video Grounding
    Da, Cheng
    Zhang, Yanhao
    Zheng, Yun
    Pan, Pan
    Xu, Yinghui
    Pan, Chunhong
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1129 - 1137
  • [27] Weakly-Supervised Generation and Grounding of Visual Descriptions with Conditional Generative Models
    Mavroudi, Effrosyni
    Vidal, Rene
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15523 - 15533
  • [28] Weakly-Supervised Grounding for VQA with Dual Visual-Linguistic Interaction
    Liu, Yi
    Pan, Junwen
    Wang, Qilong
    Chen, Guanlin
    Nie, Weiguo
    Zhang, Yudong
    Gao, Qian
    Hu, Qinghua
    Zhu, Pengfei
    ARTIFICIAL INTELLIGENCE, CICAI 2023, PT I, 2024, 14473 : 156 - 169
  • [29] Weakly-Supervised Video Object Grounding via Stable Context Learning
    Wang, Wei
    Gao, Junyu
    Xu, Changsheng
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 760 - 768
  • [30] Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video
    Chen, Zhenfang
    Ma, Lin
    Luo, Wenhan
    Wong, Kwan-Yee K.
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1884 - 1894