VL-Few: Vision Language Alignment for Multimodal Few-Shot Meta Learning

被引:0
|
作者
Ma, Han [1 ]
Fan, Baoyu [1 ]
Ng, Benjamin K. [1 ]
Lam, Chan-Tong [1 ]
机构
[1] Macao Polytech Univ, Fac Appl Sci, Macau 999078, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 03期
关键词
vision language learning; representation alignment; multimodal learning; meta learning; few-shot learning; visual question answering;
D O I
10.3390/app14031169
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Complex tasks in the real world involve different modal models, such as visual question answering (VQA). However, traditional multimodal learning requires a large amount of aligned data, such as image text pairs, and constructing a large amount of training data is a challenge for multimodal learning. Therefore, we propose VL-Few, which is a simple and effective method to solve the multimodal few-shot problem. VL-Few (1) proposes the modal alignment, which aligns visual features into language space through a lightweight model network and improves the multimodal understanding ability of the model; (2) adopts few-shot meta learning in the multimodal problem, which constructs a few-shot meta task pool to improve the generalization ability of the model; (3) proposes semantic alignment to enhance the semantic understanding ability of the model for the task, context, and demonstration; (4) proposes task alignment that constructs training data into the target task form and improves the task understanding ability of the model; (5) proposes generation alignment, which adopts the token-level training and multitask fusion loss to improve the generation ability of the model. Our experimental results show the effectiveness of VL-Few for multimodal few-shot problems.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Few-Shot Adaptation of Medical Vision-Language Models
    Shakeri, Fereshteh
    Huang, Yunshi
    Silva-Rodriguez, Julio
    Bahig, Houda
    Tang, An
    Dolz, Jose
    Ben Ayed, Ismail
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT XII, 2024, 15012 : 553 - 563
  • [22] Flamingo: a Visual Language Model for Few-Shot Learning
    Alayrac, Jean-Baptiste
    Donahue, Jeff
    Luc, Pauline
    Miech, Antoine
    Barr, Iain
    Hasson, Yana
    Lenc, Karel
    Mensch, Arthur
    Millican, Katie
    Reynolds, Malcolm
    Ring, Roman
    Rutherford, Eliza
    Cabi, Serkan
    Han, Tengda
    Gong, Zhitao
    Samangooei, Sina
    Monteiro, Marianne
    Menick, Jacob
    Borgeaud, Sebastian
    Brock, Andrew
    Nematzadeh, Aida
    Sharifzadeh, Sahand
    Binkowski, Mikolaj
    Barreira, Ricardo
    Vinyals, Oriol
    Zisserman, Andrew
    Simonyan, Karen
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [23] Fair Meta-Learning For Few-Shot Classification
    Zhao, Chen
    Li, Changbin
    Li, Jincheng
    Chen, Feng
    11TH IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH (ICKG 2020), 2020, : 275 - 282
  • [24] Meta-BN Net for few-shot learning
    Wei Gao
    Mingwen Shao
    Jun Shu
    Xinkai Zhuang
    Frontiers of Computer Science, 2023, 17
  • [25] Meta-BN Net for few-shot learning
    Gao, Wei
    Shao, Mingwen
    Shu, Jun
    Zhuang, Xinkai
    FRONTIERS OF COMPUTER SCIENCE, 2023, 17 (01)
  • [26] Meta-Learning for Few-Shot NMT Adaptation
    Sharaf, Amr
    Hassan, Hany
    Daume, Hal, III
    NEURAL GENERATION AND TRANSLATION, 2020, : 43 - 53
  • [27] Learning Implicit Temporal Alignment for Few-shot Video Classification
    Zhang, Songyang
    Zhou, Jiale
    He, Xuming
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 1309 - 1315
  • [28] Defensive Few-Shot Learning
    Li, Wenbin
    Wang, Lei
    Zhang, Xingxing
    Qi, Lei
    Huo, Jing
    Gao, Yang
    Luo, Jiebo
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (05) : 5649 - 5667
  • [29] Federated Few-shot Learning
    Wang, Song
    Fu, Xingbo
    Ding, Kaize
    Chen, Chen
    Chen, Huiyuan
    Li, Jundong
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 2374 - 2385
  • [30] META-LEARNING WITH ATTENTION FOR IMPROVED FEW-SHOT LEARNING
    Hou, Zejiang
    Walid, Anwar
    Kung, Sun-Yuan
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2725 - 2729