VL-Few: Vision Language Alignment for Multimodal Few-Shot Meta Learning

被引：0

作者：

Ma, Han ^{[1
]}

Fan, Baoyu ^{[1
]}

Ng, Benjamin K. ^{[1
]}

Lam, Chan-Tong ^{[1
]}

机构：

[1] Macao Polytech Univ, Fac Appl Sci, Macau 999078, Peoples R China

来源：

APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 03期

关键词：

vision language learning; representation alignment; multimodal learning; meta learning; few-shot learning; visual question answering;

D O I：

10.3390/app14031169

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Complex tasks in the real world involve different modal models, such as visual question answering (VQA). However, traditional multimodal learning requires a large amount of aligned data, such as image text pairs, and constructing a large amount of training data is a challenge for multimodal learning. Therefore, we propose VL-Few, which is a simple and effective method to solve the multimodal few-shot problem. VL-Few (1) proposes the modal alignment, which aligns visual features into language space through a lightweight model network and improves the multimodal understanding ability of the model; (2) adopts few-shot meta learning in the multimodal problem, which constructs a few-shot meta task pool to improve the generalization ability of the model; (3) proposes semantic alignment to enhance the semantic understanding ability of the model for the task, context, and demonstration; (4) proposes task alignment that constructs training data into the target task form and improves the task understanding ability of the model; (5) proposes generation alignment, which adopts the token-level training and multitask fusion loss to improve the generation ability of the model. Our experimental results show the effectiveness of VL-Few for multimodal few-shot problems.

引用

页数：19

共 50 条

[1] Multimodal Few-Shot Learning with Frozen Language Models
Tsimpoukelli, Maria
Menick, Jacob
Cabi, Serkan
Eslami, S. M. Ali
Vinyals, Oriol
Hill, Felix
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[2] Multimodal Few-Shot Learning for Gait Recognition
Moon, Jucheol
Nhat Anh Le
Minaya, Nelson Hebert
Choi, Sang-Il
APPLIED SCIENCES-BASEL, 2020, 10 (21): : 1 - 15
[3] Learning Meta Soft Prompt for Few-Shot Language Models
Chien, Jen-Tzung
Chen, Ming-Yen
Xue, Jing-Hao
2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 57 - 62
[4] Multimodal Prototypical Networks for Few-shot Learning
Pahde, Frederik
Puscas, Mihai
Klein, Tassilo
Nabi, Moin
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 2643 - 2652
[5] Learning Dynamic Alignment via Meta-filter for Few-shot Learning
Xu, Chengming
Fu, Yanwei
Liu, Chen
Wang, Chengjie
Li, Jilin
Huang, Feiyue
Zhang, Li
Xue, Xiangyang
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5178 - 5187
[6] Few-Shot Few-Shot Learning and the role of Spatial Attention
Lifchitz, Yann
Avrithis, Yannis
Picard, Sylvaine
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2693 - 2700
[7] True Few-Shot Learning with Language Models
Perez, Ethan
Kiela, Douwe
Cho, Kyunghyun
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
[8] Empowering few-shot learning: a multimodal optimization framework
Liriam Enamoto
Geraldo Pereira Rocha Filho
Li Weigang
Neural Computing and Applications, 2025, 37 (5) : 3539 - 3560
[9] Multimodal cross-decoupling for few-shot learning
Ji Z.
Wang S.
Yu Y.
Guofang Keji Daxue Xuebao/Journal of National University of Defense Technology, 2024, 46 (01): : 12 - 21
[10] Empowering few-shot learning: a multimodal optimization framework
Enamoto, Liriam
Rocha Filho, Geraldo Pereira
Weigang, Li
Neural Computing and Applications, 2024,

← 1 2 3 4 5 →