Subsampling of Frequent Words in Text for Pre-training a Vision-Language Model

被引:0
|
作者
Liang, Mingliang [1 ]
Larson, Martha [1 ]
机构
[1] Radboud Univ Nijmegen, Nijmegen, Netherlands
关键词
Vision-language model; subsampling; frequent words; zero-shot image Classification;
D O I
10.1145/3607827.3616843
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we introduce Subsampling of frequentWords for Contrastive Language-Image Pre-training (SW-CLIP), a novel approach for the training Vision-Language Models (VLMs). SW-CLIP uses frequency-based subsampling of words that has been previously proposed to train skip-gram models in natural language processing and applies it to the textual training data of VLMs. We report on experiments that demonstrate the ability of frequency-based subsampling to speed up training and also to deliver a substantial improvement in accuracy in a number of downstream zero-shot (i.e., transfer) classification tasks. We notice that the classification test sets on which SW-CLIP seems to be particularly effective are those in which the labels of the classes occur infrequently as words in the training data, and thus have a high probability of being retained during frequency-based subsampling of the model training data. Overall, the advantages of SW-CLIP demonstrated in this paper serves to motivated further future work in text subsampling for the training of VLMs. Our code and pre-trained weights are available at https://github.com/Anastasiais-ml/sw_clip.git
引用
收藏
页码:61 / 67
页数:7
相关论文
共 50 条
  • [21] Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision
    Wang, Tzu-Jui Julius
    Laaksonen, Jorma
    Langer, Tomas
    Arponen, Heikki
    Bishop, Tom E.
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 1073 - 1083
  • [22] Superpixel semantics representation and pre-training for vision-language tasks
    Zhang, Siyu
    Chen, Yeming
    Sun, Yaoru
    Wang, Fang
    Yang, Jun
    Bai, Lizhi
    Gao, Shangce
    NEUROCOMPUTING, 2025, 615
  • [23] Too Large; Data Reduction for Vision-Language Pre-Training
    Wang, Alex Jinpeng
    Lin, Kevin Qinghong
    Zhang, David Junhao
    Lei, Stan Weixian
    Shou, Mike Zheng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3124 - 3134
  • [24] Scaling Up Vision-Language Pre-training for Image Captioning
    Hu, Xiaowei
    Gan, Zhe
    Wang, Jianfeng
    Yang, Zhengyuan
    Liu, Zicheng
    Lu, Yumao
    Wang, Lijuan
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17959 - 17968
  • [25] Towards Adversarial Attack on Vision-Language Pre-training Models
    Zhang, Jiaming
    Yi, Qi
    Sang, Jitao
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5005 - 5013
  • [26] MAFA: Managing False Negatives for Vision-Language Pre-training
    Byun, Jaeseok
    Kim, Dohoon
    Moon, Taesup
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 27304 - 27314
  • [27] Unsupervised Domain Adaption Harnessing Vision-Language Pre-Training
    Zhou, Wenlve
    Zhou, Zhiheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (09) : 8201 - 8214
  • [28] Multimodal Pre-training Method for Vision-language Understanding and Generation
    Liu T.-Y.
    Wu Z.-X.
    Chen J.-J.
    Jiang Y.-G.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2024 - 2034
  • [29] Unified Vision-Language Pre-Training for Image Captioning and VQA
    Zhou, Luowei
    Palangi, Hamid
    Zhang, Lei
    Hu, Houdong
    Corso, Jason J.
    Gao, Jianfeng
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13041 - 13049
  • [30] Enhancing Visual Grounding in Vision-Language Pre-Training With Position-Guided Text Prompts
    Wang, Alex Jinpeng
    Zhou, Pan
    Shou, Mike Zheng
    Yan, Shuicheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (05) : 3406 - 3421