VL-Few: Vision Language Alignment for Multimodal Few-Shot Meta Learning

被引:0
|
作者
Ma, Han [1 ]
Fan, Baoyu [1 ]
Ng, Benjamin K. [1 ]
Lam, Chan-Tong [1 ]
机构
[1] Macao Polytech Univ, Fac Appl Sci, Macau 999078, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 03期
关键词
vision language learning; representation alignment; multimodal learning; meta learning; few-shot learning; visual question answering;
D O I
10.3390/app14031169
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Complex tasks in the real world involve different modal models, such as visual question answering (VQA). However, traditional multimodal learning requires a large amount of aligned data, such as image text pairs, and constructing a large amount of training data is a challenge for multimodal learning. Therefore, we propose VL-Few, which is a simple and effective method to solve the multimodal few-shot problem. VL-Few (1) proposes the modal alignment, which aligns visual features into language space through a lightweight model network and improves the multimodal understanding ability of the model; (2) adopts few-shot meta learning in the multimodal problem, which constructs a few-shot meta task pool to improve the generalization ability of the model; (3) proposes semantic alignment to enhance the semantic understanding ability of the model for the task, context, and demonstration; (4) proposes task alignment that constructs training data into the target task form and improves the task understanding ability of the model; (5) proposes generation alignment, which adopts the token-level training and multitask fusion loss to improve the generation ability of the model. Our experimental results show the effectiveness of VL-Few for multimodal few-shot problems.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Multimodal Few-Shot Learning with Frozen Language Models
    Tsimpoukelli, Maria
    Menick, Jacob
    Cabi, Serkan
    Eslami, S. M. Ali
    Vinyals, Oriol
    Hill, Felix
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [2] Multimodal Few-Shot Learning for Gait Recognition
    Moon, Jucheol
    Nhat Anh Le
    Minaya, Nelson Hebert
    Choi, Sang-Il
    APPLIED SCIENCES-BASEL, 2020, 10 (21): : 1 - 15
  • [3] Learning Meta Soft Prompt for Few-Shot Language Models
    Chien, Jen-Tzung
    Chen, Ming-Yen
    Xue, Jing-Hao
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 57 - 62
  • [4] Multimodal Prototypical Networks for Few-shot Learning
    Pahde, Frederik
    Puscas, Mihai
    Klein, Tassilo
    Nabi, Moin
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 2643 - 2652
  • [5] Learning Dynamic Alignment via Meta-filter for Few-shot Learning
    Xu, Chengming
    Fu, Yanwei
    Liu, Chen
    Wang, Chengjie
    Li, Jilin
    Huang, Feiyue
    Zhang, Li
    Xue, Xiangyang
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5178 - 5187
  • [6] Few-Shot Few-Shot Learning and the role of Spatial Attention
    Lifchitz, Yann
    Avrithis, Yannis
    Picard, Sylvaine
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2693 - 2700
  • [7] True Few-Shot Learning with Language Models
    Perez, Ethan
    Kiela, Douwe
    Cho, Kyunghyun
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [8] Empowering few-shot learning: a multimodal optimization framework
    Liriam Enamoto
    Geraldo Pereira Rocha Filho
    Li Weigang
    Neural Computing and Applications, 2025, 37 (5) : 3539 - 3560
  • [9] Multimodal cross-decoupling for few-shot learning
    Ji Z.
    Wang S.
    Yu Y.
    Guofang Keji Daxue Xuebao/Journal of National University of Defense Technology, 2024, 46 (01): : 12 - 21
  • [10] Empowering few-shot learning: a multimodal optimization framework
    Enamoto, Liriam
    Rocha Filho, Geraldo Pereira
    Weigang, Li
    Neural Computing and Applications, 2024,