Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

被引:27
|
作者
Wang, Liwei [1 ]
Huang, Jing [2 ]
Li, Yin [3 ]
Xu, Kun [4 ]
Yang, Zhengyuan [5 ]
Yu, Dong [4 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] Univ Illinois, Champaign, IL USA
[3] Univ Wisconsin Madison, Madison, WI USA
[4] Tencent AI Lab, Bellevue, WA USA
[5] Univ Rochester, Rochester, NY 14627 USA
关键词
D O I
10.1109/CVPR46437.2021.01387
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Weakly supervised phrase grounding aims at learning region-phrase correspondences using only image-sentence pairs. A major challenge thus lies in the missing links between image regions and sentence phrases during training. To address this challenge, we leverage a generic object detector at training time, and propose a contrastive learning framework that accounts for both region-phrase and image-sentence matching. Our core innovation is the learning of a region-phrase score function, based on which an image-sentence score function is further constructed. Importantly, our region-phrase score function is learned by distilling from soft matching scores between the detected object names and candidate phrases within an image-sentence pair, while the image-sentence score function is supervised by ground-truth image-sentence pairs. The design of such score functions removes the need of object detection at test time, thereby significantly reducing the inference cost. Without bells and whistles, our approach achieves state-of-the-art results on visual phrase grounding, surpassing previous methods that require expensive object detectors at test time.
引用
收藏
页码:14085 / 14095
页数:11
相关论文
共 50 条
  • [1] Improving weakly supervised phrase grounding via visual representation contextualization with contrastive learning
    Wang, Xue
    Du, Youtian
    Verberne, Suzan
    Verbeek, Fons J.
    APPLIED INTELLIGENCE, 2023, 53 (11) : 14690 - 14702
  • [2] Improving weakly supervised phrase grounding via visual representation contextualization with contrastive learning
    Xue Wang
    Youtian Du
    Suzan Verberne
    Fons J. Verbeek
    Applied Intelligence, 2023, 53 : 14690 - 14702
  • [3] Weakly Supervised Referring Expression Grounding via Target-Guided Knowledge Distillation
    Mi, Jinpeng
    Tang, Song
    Ma, Zhiyuan
    Liu, Dan
    Li, Qingdu
    Zhang, Jianwei
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 8299 - 8305
  • [4] Weakly Supervised Referring Expression Grounding via Dynamic Self-Knowledge Distillation
    Mi, Jinpeng
    Chen, Zhiqian
    Zhang, Jianwei
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS, 2023, : 1254 - 1260
  • [5] Counterfactual contrastive learning for weakly supervised temporal sentence grounding
    Xu, Yenan
    Xu, Wanru
    Miao, Zhenjiang
    NEUROCOMPUTING, 2025, 624
  • [6] Contrastive Perturbation Network for Weakly Supervised Temporal Sentence Grounding
    Han, Tingting
    Lv, Yuanxin
    Yu, Zhou
    Yu, Jun
    Fan, Jianping
    Yuan, Liu
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 446 - 460
  • [7] Visual Grounding With Dual Knowledge Distillation
    Wu, Wansen
    Cao, Meng
    Hu, Yue
    Peng, Yong
    Qin, Long
    Yin, Quanjun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 10399 - 10410
  • [8] Knowledge Aided Consistency for Weakly Supervised Phrase Grounding
    Chen, Kan
    Gao, Jiyang
    Nevatia, Ram
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4042 - 4050
  • [9] Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
    Xiao, Fanyi
    Sigal, Leonid
    Lee, Yong Jae
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5253 - 5262
  • [10] Atomic-action-based Contrastive Network for Weakly Supervised Temporal Language Grounding
    Wu, Hongzhou
    Lyu, Yifan
    Shen, Xingyu
    Zhao, Xuechen
    Wang, Mengzhu
    Zhang, Xiang
    Luo, Zhigang
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1523 - 1528