Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

被引:27
|
作者
Wang, Liwei [1 ]
Huang, Jing [2 ]
Li, Yin [3 ]
Xu, Kun [4 ]
Yang, Zhengyuan [5 ]
Yu, Dong [4 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] Univ Illinois, Champaign, IL USA
[3] Univ Wisconsin Madison, Madison, WI USA
[4] Tencent AI Lab, Bellevue, WA USA
[5] Univ Rochester, Rochester, NY 14627 USA
关键词
D O I
10.1109/CVPR46437.2021.01387
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Weakly supervised phrase grounding aims at learning region-phrase correspondences using only image-sentence pairs. A major challenge thus lies in the missing links between image regions and sentence phrases during training. To address this challenge, we leverage a generic object detector at training time, and propose a contrastive learning framework that accounts for both region-phrase and image-sentence matching. Our core innovation is the learning of a region-phrase score function, based on which an image-sentence score function is further constructed. Importantly, our region-phrase score function is learned by distilling from soft matching scores between the detected object names and candidate phrases within an image-sentence pair, while the image-sentence score function is supervised by ground-truth image-sentence pairs. The design of such score functions removes the need of object detection at test time, thereby significantly reducing the inference cost. Without bells and whistles, our approach achieves state-of-the-art results on visual phrase grounding, surpassing previous methods that require expensive object detectors at test time.
引用
收藏
页码:14085 / 14095
页数:11
相关论文
共 50 条
  • [31] Local-Global Multi-Modal Distillation for Weakly-Supervised Temporal Video Grounding
    Bao, Peijun
    Xia, Yong
    Yang, Wenhan
    Ng, Boon Poh
    Er, Meng Hwa
    Kot, Alex C.
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 738 - 746
  • [32] Knowledge-guided Pairwise Reconstruction Network for Weakly Supervised Referring Expression Grounding
    Liu, Xuejing
    Li, Liang
    Wang, Shuhui
    Zha, Zheng-Jun
    Su, Li
    Huang, Qingming
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 539 - 547
  • [33] SCL-IKD: intermediate knowledge distillation via supervised contrastive representation learning
    Saurabh Sharma
    Shikhar Singh Lodhi
    Joydeep Chandra
    Applied Intelligence, 2023, 53 : 28520 - 28541
  • [34] Contrastive Supervised Distillation for Continual Representation Learning
    Barletti, Tommaso
    Biondi, Niccolo
    Pernici, Federico
    Bruni, Matteo
    Del Bimbo, Alberto
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT I, 2022, 13231 : 597 - 609
  • [35] SCL-IKD: intermediate knowledge distillation via supervised contrastive representation learning
    Sharma, Saurabh
    Lodhi, Shikhar Singh
    Chandra, Joydeep
    APPLIED INTELLIGENCE, 2023, 53 (23) : 28520 - 28541
  • [36] Weakly Supervised Cross-lingual Semantic Relation Classification via Knowledge Distillation
    Vyas, Yogarshi
    Carpuat, Marine
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 5285 - 5296
  • [37] Weakly Supervised Exaggeration Transfer for Caricature Generation With Cross-Modal Knowledge Distillation
    Tong, Shuo
    Liu, Han
    He, Yuxin
    Du, Chenxiao
    Wang, Wenqing
    Guo, Runyuan
    Liu, Jingyun
    IEEE COMPUTER GRAPHICS AND APPLICATIONS, 2024, 44 (04) : 98 - 112
  • [38] Spatial likelihood voting with self-knowledge distillation for weakly supervised object detection
    Chen, Ze
    Fu, Zhihang
    Huang, Jianqiang
    Tao, Mingyuan
    Jiang, Rongxin
    Tian, Xiang
    Chen, Yaowu
    Hua, Xian-Sheng
    IMAGE AND VISION COMPUTING, 2021, 116
  • [39] Bi-directional Weakly Supervised Knowledge Distillation for Whole Slide Image Classification
    Qu, Linhao
    Luo, Xiaoyuan
    Wang, Manning
    Song, Zhijian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [40] INVESTIGATING POOLING STRATEGIES AND LOSS FUNCTIONS FOR WEAKLY-SUPERVISED TEXT-TO-AUDIO GROUNDING VIA CONTRASTIVE LEARNING
    Xu, Xuenan
    Wu, Mengyue
    Yu, Kai
    2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,