VL-Few: Vision Language Alignment for Multimodal Few-Shot Meta Learning

被引:0
|
作者
Ma, Han [1 ]
Fan, Baoyu [1 ]
Ng, Benjamin K. [1 ]
Lam, Chan-Tong [1 ]
机构
[1] Macao Polytech Univ, Fac Appl Sci, Macau 999078, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 03期
关键词
vision language learning; representation alignment; multimodal learning; meta learning; few-shot learning; visual question answering;
D O I
10.3390/app14031169
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Complex tasks in the real world involve different modal models, such as visual question answering (VQA). However, traditional multimodal learning requires a large amount of aligned data, such as image text pairs, and constructing a large amount of training data is a challenge for multimodal learning. Therefore, we propose VL-Few, which is a simple and effective method to solve the multimodal few-shot problem. VL-Few (1) proposes the modal alignment, which aligns visual features into language space through a lightweight model network and improves the multimodal understanding ability of the model; (2) adopts few-shot meta learning in the multimodal problem, which constructs a few-shot meta task pool to improve the generalization ability of the model; (3) proposes semantic alignment to enhance the semantic understanding ability of the model for the task, context, and demonstration; (4) proposes task alignment that constructs training data into the target task form and improves the task understanding ability of the model; (5) proposes generation alignment, which adopts the token-level training and multitask fusion loss to improve the generation ability of the model. Our experimental results show the effectiveness of VL-Few for multimodal few-shot problems.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] Survey on Few-shot Learning
    Zhao K.-L.
    Jin X.-L.
    Wang Y.-Z.
    Ruan Jian Xue Bao/Journal of Software, 2021, 32 (02): : 349 - 369
  • [32] Variational Few-Shot Learning
    Zhang, Jian
    Zhao, Chenglong
    Ni, Bingbing
    Xu, Minghao
    Yang, Xiaokang
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 1685 - 1694
  • [33] Meta-transfer-adjustment learning for few-shot learning
    Chen, Yadang
    Yan, Hui
    Yang, Zhi-Xin
    Wu, Enhua
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 89
  • [34] Fractal Few-Shot Learning
    Zhou, Fobao
    Huang, Wenkai
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 35 (11) : 1 - 15
  • [35] Fractal Few-Shot Learning
    Zhou, Fobao
    Huang, Wenkai
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (11) : 16353 - 16367
  • [36] Read-only Prompt Optimization for Vision-Language Few-shot Learning
    Lee, Dongjun
    Song, Seokwon
    Suh, Jihee
    Choi, Joonmyeong
    Lee, Sanghyeok
    Kim, Hyunwoo J.
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1401 - 1411
  • [37] Meta-pruning: Learning to Prune on Few-Shot Learning
    Chu, Yan
    Liu, Keshi
    Jiang, Songhao
    Sun, Xianghui
    Wang, Baoxu
    Wang, Zhengkui
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT I, KSEM 2024, 2024, 14884 : 74 - 85
  • [38] Vision-Language Alignment Learning Under Affinity and Divergence Principles for Few-Shot Out-of-Distribution Generalization
    Zhu, Lin
    Yin, Weihan
    Yang, Yiyao
    Wu, Fan
    Zeng, Zhaoyu
    Gu, Qinying
    Wang, Xinbing
    Zhou, Chenghu
    Ye, Nanyang
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (09) : 3375 - 3407
  • [39] Collect and Select: Semantic Alignment Metric Learning for Few-Shot Learning
    Hao, Fusheng
    He, Fengxiang
    Cheng, Jun
    Wang, Lei
    Cao, Jianzhong
    Tao, Dacheng
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8459 - 8468
  • [40] Interventional Few-Shot Learning
    Yue, Zhongqi
    Zhang, Hanwang
    Sun, Qianru
    Hua, Xian-Sheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33