Subsampling of Frequent Words in Text for Pre-training a Vision-Language Model

被引：0

作者：

Liang, Mingliang ^{[1
]}

Larson, Martha ^{[1
]}

机构：

[1] Radboud Univ Nijmegen, Nijmegen, Netherlands

来源：

PROCEEDINGS OF THE 1ST WORKSHOP ON LARGE GENERATIVE MODELS MEET MULTIMODAL APPLICATIONS, LGM3A 2023 | 2023年

关键词：

Vision-language model; subsampling; frequent words; zero-shot image Classification;

D O I：

10.1145/3607827.3616843

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we introduce Subsampling of frequentWords for Contrastive Language-Image Pre-training (SW-CLIP), a novel approach for the training Vision-Language Models (VLMs). SW-CLIP uses frequency-based subsampling of words that has been previously proposed to train skip-gram models in natural language processing and applies it to the textual training data of VLMs. We report on experiments that demonstrate the ability of frequency-based subsampling to speed up training and also to deliver a substantial improvement in accuracy in a number of downstream zero-shot (i.e., transfer) classification tasks. We notice that the classification test sets on which SW-CLIP seems to be particularly effective are those in which the labels of the classes occur infrequently as words in the training data, and thus have a high probability of being retained during frequency-based subsampling of the model training data. Overall, the advantages of SW-CLIP demonstrated in this paper serves to motivated further future work in text subsampling for the training of VLMs. Our code and pre-trained weights are available at https://github.com/Anastasiais-ml/sw_clip.git

引用

页码：61 / 67

页数：7

共 50 条

[41] Anatomical Structure-Guided Medical Vision-Language Pre-training
Li, Qingqiu
Yan, Xiaohan
Xu, Jilan
Yuan, Runtian
Zhang, Yuejie
Feng, Rui
Shen, Quanli
Zhang, Xiaobo
Wang, Shujun
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT XI, 2024, 15011 : 80 - 90
[42] Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-training
Zhang, Wenyu
Shen, Li
Foo, Chuan-Sheng
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (02) : 844 - 866
[43] VLCDoC: Vision-Language contrastive pre-training model for cross-Modal document classification
Bakkali, Souhail
Ming, Zuheng
Coustaty, Mickael
Rusinol, Marcal
Ramos Terrades, Oriol
PATTERN RECOGNITION, 2023, 139
[44] Cross-modality interaction reasoning for enhancing vision-language pre-training in image-text retrieval
Yao, Tao
Peng, Shouyong
Wang, Lili
Li, Ying
Sun, Yujuan
APPLIED INTELLIGENCE, 2024, 54 (23) : 12230 - 12245
[45] COPA : Efficient Vision-Language Pre-training through Collaborative Object- and Patch-Text Alignment
Jiang, Chaoya
Xu, Haiyang
Ye, Wei
Ye, Qinghao
Li, Chenliang
Yan, Ming
Bi, Bin
Zhang, Shikun
Huang, Fei
Zhang, Ji
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4480 - 4491
[46] IDEA: Increasing Text Diversity via Online Multi-Label Recognition for Vision-Language Pre-training
Huang, Xinyu
Zhang, Youcai
Cheng, Ying
Tian, Weiwei
Zhao, Ruiwei
Feng, Rui
Zhang, Yuejie
Li, Yaqian
Guo, Yandong
Zhang, Xiaobo
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4573 - 4583
[47] MAKE: Vision-Language Pre-training based Product Retrieval in Taobao Search
Zheng, Xiaoyang
Wang, Zilong
Li, Sen
Xu, Ke
Zhuang, Tao
Liu, Qingwen
Zeng, Xiaoyi
COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 356 - 360
[48] VLMO: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
Bao, Hangbo
Wang, Wenhui
Dong, Li
Liu, Qiang
Mohammed, Owais Khan
Aggarwal, Kriti
Som, Subhojit
Piao, Songhao
Wei, Furu
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[49] Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
Liu, Zikang
Chen, Sihan
Guo, Longteng
Li, Handong
He, Xingjian
Liu, Jing
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5120 - 5131
[50] Automated Bridge Inspection Image Interpretation Based on Vision-Language Pre-Training
Wang, Shengyi
El-Gohary, Nora
COMPUTING IN CIVIL ENGINEERING 2023-DATA, SENSING, AND ANALYTICS, 2024, : 1 - 8

← 1 2 3 4 5 →