Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

被引:27
|
作者
Wang, Liwei [1 ]
Huang, Jing [2 ]
Li, Yin [3 ]
Xu, Kun [4 ]
Yang, Zhengyuan [5 ]
Yu, Dong [4 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] Univ Illinois, Champaign, IL USA
[3] Univ Wisconsin Madison, Madison, WI USA
[4] Tencent AI Lab, Bellevue, WA USA
[5] Univ Rochester, Rochester, NY 14627 USA
关键词
D O I
10.1109/CVPR46437.2021.01387
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Weakly supervised phrase grounding aims at learning region-phrase correspondences using only image-sentence pairs. A major challenge thus lies in the missing links between image regions and sentence phrases during training. To address this challenge, we leverage a generic object detector at training time, and propose a contrastive learning framework that accounts for both region-phrase and image-sentence matching. Our core innovation is the learning of a region-phrase score function, based on which an image-sentence score function is further constructed. Importantly, our region-phrase score function is learned by distilling from soft matching scores between the detected object names and candidate phrases within an image-sentence pair, while the image-sentence score function is supervised by ground-truth image-sentence pairs. The design of such score functions removes the need of object detection at test time, thereby significantly reducing the inference cost. Without bells and whistles, our approach achieves state-of-the-art results on visual phrase grounding, surpassing previous methods that require expensive object detectors at test time.
引用
收藏
页码:14085 / 14095
页数:11
相关论文
共 50 条
  • [21] Weakly-Supervised Generation and Grounding of Visual Descriptions with Conditional Generative Models
    Mavroudi, Effrosyni
    Vidal, Rene
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15523 - 15533
  • [22] Improving Event Representation via Simultaneous Weakly Supervised Contrastive Learning and Clustering
    Gao, Jun
    Wang, Wei
    Yu, Changlong
    Zhao, Huan
    Ng, Wilfred
    Xu, Ruifeng
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3036 - 3049
  • [23] Knowledge Consistency Distillation for Weakly Supervised One Step Person Search
    Li, Zongyi
    Shi, Yuxuan
    Ling, Hefei
    Chen, Jiazhong
    Wang, Runsheng
    Zhao, Chengxin
    Wang, Qian
    Huang, Shijuan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (11) : 11695 - 11708
  • [24] Adaptive knowledge distillation and integration for weakly supervised referring expression comprehension
    Mi, Jinpeng
    Wermter, Stefan
    Zhang, Jianwei
    KNOWLEDGE-BASED SYSTEMS, 2024, 286
  • [25] Self-Supervised Contrastive Learning for Camera-to-Radar Knowledge Distillation
    Wang, Wenpeng
    Campbell, Bradford
    Munir, Sirajum
    2024 20TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING IN SMART SYSTEMS AND THE INTERNET OF THINGS, DCOSS-IOT 2024, 2024, : 154 - 161
  • [26] Modality-Aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection
    Yu, Jiashuo
    Liu, Jinyu
    Cheng, Ying
    Feng, Rui
    Zhang, Yuejie
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 6278 - 6287
  • [27] Improving Structural and Semantic Global Knowledge in Graph Contrastive Learning with Distillation
    Wen, Mi
    Wang, Hongwei
    Xue, Yunsheng
    Wu, Yi
    Wen, Hong
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II, PAKDD 2024, 2024, 14646 : 364 - 375
  • [28] Online Knowledge Distillation via Mutual Contrastive Learning for Visual Recognition
    Yang, Chuanguang
    An, Zhulin
    Zhou, Helong
    Zhuang, Fuzhen
    Xu, Yongjun
    Zhang, Qian
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (08) : 10212 - 10227
  • [29] Confidence-aware Pseudo-label Learning for Weakly Supervised Visual Grounding
    Liu, Yang
    Zhang, Jiahua
    Chen, Qingchao
    Peng, Yuxin
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2816 - 2826
  • [30] Finding "It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos
    Huang, De-An
    Buch, Shyamal
    Dery, Lucio
    Garg, Animesh
    Li Fei-Fei
    Niebles, Juan Carlos
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5948 - 5957