Fine-Tuning for Few-Shot Image Classification by Multimodal Prototype Regularization

被引:2
|
作者
Wu, Qianhao [1 ]
Qi, Jiaxin [2 ]
Zhang, Dong [1 ]
Zhang, Hanwang
Tang, Jinhui [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 639798, Singapore
基金
新加坡国家研究基金会;
关键词
Training; Visualization; Testing; Task analysis; Prototypes; Feature extraction; Tuning; Few-shot classification; large pre-trained vision-language models; model fine-tuning; prototype regularization; NETWORK; MODELS;
D O I
10.1109/TMM.2024.3379896
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Large pre-trained vision-language models, such as CLIP [Radford et al. 2021], have demonstrated remarkable performance in few-shot image classification. To facilitate the rapid adaptation of CLIP in downstream tasks with limited visual samples, two primary frameworks have been proposed. The first framework centers on the image encoder and introduces a trainable visual classifier after the backbone to generate logits for each object class. Nevertheless, this framework heavily depends on limited visual features extracted by the pre-trained visual encoder, which can result in over-fitting issues. The second framework aims to optimize the text encoder by using trainable soft language prompts and computing logits for each class based on the similarity between image features and optimized prompt features. However, this framework encounters the issue of imperfect alignment between the representations extracted by the image and text encoders, making it difficult to fine-tune the language prompts using visual samples. This paper proposes a Multi-Modal Prototype Regularization (MMPR) method for CLIP-based few-shot fine-tuning for image classification. MMPR can address the challenges of effectively utilizing both image and text features. MMPR fine-tunes a classifier and regularizes its weights using both image-based (ImgPR) and text-based (TexPR) prototypes. ImgPR represents the mean of image representations within the same class, derived from the image encoder, to distill specific visual distribution knowledge for classifier adaptation. TexPR represents the hand-crafted prompt associated with the class, derived from the text encoder, to incorporate general encyclopedic knowledge and mitigate visual over-fitting. MMPR significantly leverages both image and text information without increasing computational complexity during the inference stage compared to existing methods. Experimental results on various challenging public benchmarks demonstrate the superiority of the proposed MMPR method over state-of-the-art methods.
引用
收藏
页码:8543 / 8556
页数:14
相关论文
共 50 条
  • [41] Few-Shot Fine-Grained Image Classification via GNN
    Zhou, Xiangyu
    Zhang, Yuhui
    Wei, Qianru
    SENSORS, 2022, 22 (19)
  • [42] Few-Shot Fine-Grained Image Classification: A Comprehensive Review
    Ren, Jie
    Li, Changmiao
    An, Yaohui
    Zhang, Weichuan
    Sun, Changming
    AI, 2024, 5 (01) : 405 - 425
  • [43] DOMAIN GENERALIZED FEW-SHOT IMAGE CLASSIFICATION VIA META REGULARIZATION NETWORK
    Zhang, Min
    Huang, Siteng
    Wang, Donglin
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3748 - 3752
  • [44] Adaptive prototype few-shot image classification method based on feature pyramid
    Shen, Linshan
    Feng, Xiang
    Xu, Li
    Ding, Weiyue
    PeerJ Computer Science, 2024, 10
  • [45] Dual-Channel Prototype Network for Few-Shot Pathology Image Classification
    Quan, Hao
    Li, Xinjia
    Hu, Dayu
    Nan, Tianhang
    Cui, Xiaoyu
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (07) : 4132 - 4144
  • [46] Few-shot incremental learning with continual prototype calibration for remote sensing image fine-grained classification
    Zhu, Zining
    Wang, Peijin
    Diao, Wenhui
    Yang, Jinze
    Wang, Hongqi
    Sun, Xian
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2023, 196 : 210 - 227
  • [47] Adaptive prototype few-shot image classification method based on feature pyramid
    Shen, Linshan
    Feng, Xiang
    Xu, Li
    Ding, Weiyue
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [48] Revisit Few-shot Intent Classification with PLMs: Direct Fine-tuning vs. Continual Pre-training
    Zhang, Haode
    Liang, Haowen
    Zh, Liming
    Lam, Albert Y. S.
    Wu, Xiao-Ming
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 11105 - 11119
  • [49] Few-shot electrical equipment image recognition method based on an improved two-stage fine-tuning approach
    Wu, Junpeng
    Zeng, Jiajun
    Zhou, Yibo
    Zhang, Ye
    Zhang, Yiwen
    JOURNAL OF ENGINEERING-JOE, 2023, 2023 (09):
  • [50] Few-Shot Classification Study for Prototype Fusion and Completion
    Wang, Yuheng
    Sun, Yanguo
    Lan, Zhenping
    Wang, Nan
    Li, Jiansong
    Yang, Xincheng
    IEEE Access, 2024, 12 : 174133 - 174143