Multimodal Instruction Tuning with Conditional Mixture of LoRA

被引：0

作者：

Shen, Ying ^{[1
]}

Xu, Zhiyang ^{[1
]}

Wang, Qifan

Cheng, Yu ^{[2
]}

Yin, Wenpeng ^{[3
]}

Huang, Lifu ^{[1
]}

机构：

[1] Virginia Tech, Blacksburg, VA 24061 USA

[2] Chinese Univ Hong Kong, Hong Kong, Peoples R China

[3] Penn State Univ, University Pk, PA 16802 USA

来源：

PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in diverse tasks across different domains, with an increasing focus on improving their zeroshot generalization capabilities for unseen multimodal tasks. Multimodal instruction tuning has emerged as a successful strategy for achieving zero-shot generalization by fine-tuning pre-trained models on diverse multimodal tasks through instructions. As MLLMs grow in complexity and size, the need for parameter-efficient fine-tuning methods like Low-Rank Adaption (LoRA), which fine-tunes with a minimal set of parameters, becomes essential. However, applying LoRA in multimodal instruction tuning presents the challenge of task interference, which leads to performance degradation, especially when dealing with a broad array of multimodal tasks. To address this, this paper introduces a novel approach that integrates multimodal instruction tuning with Conditional Mixture-of-LoRA (MixLoRA). It innovates upon LoRA by dynamically constructing low-rank adaptation matrices tailored to the unique demands of each input instance, aiming to mitigate task interference. Experimental results on various multimodal evaluation datasets indicate that MixLoRA not only outperforms the conventional LoRA with the same or even higher ranks, demonstrating its efficacy and adaptability in diverse multimodal tasks(1).

引用

页码：637 / 648

页数：12

共 50 条

[1] Instruction Tuning Large Language Models for Multimodal Relation Extraction Using LoRA
Li, Zou
Pang, Ning
Zhao, Xiang
WEB INFORMATION SYSTEMS AND APPLICATIONS, WISA 2024, 2024, 14883 : 364 - 376
[2] UMIE: Unified Multimodal Information Extraction with Instruction Tuning
Sun, Lin
Zhang, Kai
Li, Qingyuan
Lou, Renze
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19062 - 19070
[3] SMART: Submodular Data Mixture Strategy for Instruction Tuning
Renduchintala, H. S. V. N. S. Kowndinya
Bhatia, Sumit
Ramakrishnan, Ganesh
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 12916 - 12934
[4] FashionGPT: LLM instruction fine-tuning with multiple LoRA-adapter fusion
Gao, Dehong
Ma, Yufei
Liu, Sen
Song, Mengfei
Jin, Linbo
Jiang, Wen
Wang, Xin
Ning, Wei
Yu, Shanqing
Xuan, Qi
Cai, Xiaoyan
Yang, Libin
KNOWLEDGE-BASED SYSTEMS, 2024, 299
[5] Instruction Tuning-Free Visual Token Complement for Multimodal LLMs
Wang, Dongsheng
Cui, Jiequan
Li, Miaoge
Lin, Wang
Chen, Bo
Zhang, Hanwang
COMPUTER VISION - ECCV 2024, PT LXXXI, 2025, 15139 : 446 - 462
[6] Multi-LoRA continual learning based instruction tuning framework for universal information extraction
Jin, Yu
Liu, Jie
Chen, Shaowei
KNOWLEDGE-BASED SYSTEMS, 2025, 308
[7] Chain-of-LoRA: Enhancing the Instruction Fine-Tuning Performance of Low-Rank Adaptation on Diverse Instruction Set
Qiu, Xihe
Hao, Teqi
Shi, Shaojie
Tan, Xiaoyu
Xiong, Yu-Jie
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 875 - 879
[8] MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis
Zheng, Jianbin
Liu, Daqing
Wang, Chaoyue
Hu, Minghui
Yang, Zuopeng
Ding, Changxing
Tao, Dacheng
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (09) : 3537 - 3565
[9] MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis
Zheng, Jianbin
Liu, Daqing
Wang, Chaoyue
Hu, Minghui
Yang, Zuopeng
Ding, Changxing
Tao, Dacheng
arXiv, 2023,
[10] Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
Lee, Young-Suk
Sultan, Md Arafat
El-Kurdi, Yousef
Munawar, Tahira Naseem Asim
Florian, Radu
Roukos, Salim
Astudillo, Ramon Fernandez
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 12561 - 12571

← 1 2 3 4 5 →