Multimodal Instruction Tuning with Conditional Mixture of LoRA

被引:0
|
作者
Shen, Ying [1 ]
Xu, Zhiyang [1 ]
Wang, Qifan
Cheng, Yu [2 ]
Yin, Wenpeng [3 ]
Huang, Lifu [1 ]
机构
[1] Virginia Tech, Blacksburg, VA 24061 USA
[2] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[3] Penn State Univ, University Pk, PA 16802 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in diverse tasks across different domains, with an increasing focus on improving their zeroshot generalization capabilities for unseen multimodal tasks. Multimodal instruction tuning has emerged as a successful strategy for achieving zero-shot generalization by fine-tuning pre-trained models on diverse multimodal tasks through instructions. As MLLMs grow in complexity and size, the need for parameter-efficient fine-tuning methods like Low-Rank Adaption (LoRA), which fine-tunes with a minimal set of parameters, becomes essential. However, applying LoRA in multimodal instruction tuning presents the challenge of task interference, which leads to performance degradation, especially when dealing with a broad array of multimodal tasks. To address this, this paper introduces a novel approach that integrates multimodal instruction tuning with Conditional Mixture-of-LoRA (MixLoRA). It innovates upon LoRA by dynamically constructing low-rank adaptation matrices tailored to the unique demands of each input instance, aiming to mitigate task interference. Experimental results on various multimodal evaluation datasets indicate that MixLoRA not only outperforms the conventional LoRA with the same or even higher ranks, demonstrating its efficacy and adaptability in diverse multimodal tasks(1).
引用
收藏
页码:637 / 648
页数:12
相关论文
共 50 条
  • [1] Instruction Tuning Large Language Models for Multimodal Relation Extraction Using LoRA
    Li, Zou
    Pang, Ning
    Zhao, Xiang
    WEB INFORMATION SYSTEMS AND APPLICATIONS, WISA 2024, 2024, 14883 : 364 - 376
  • [2] UMIE: Unified Multimodal Information Extraction with Instruction Tuning
    Sun, Lin
    Zhang, Kai
    Li, Qingyuan
    Lou, Renze
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19062 - 19070
  • [3] SMART: Submodular Data Mixture Strategy for Instruction Tuning
    Renduchintala, H. S. V. N. S. Kowndinya
    Bhatia, Sumit
    Ramakrishnan, Ganesh
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 12916 - 12934
  • [4] FashionGPT: LLM instruction fine-tuning with multiple LoRA-adapter fusion
    Gao, Dehong
    Ma, Yufei
    Liu, Sen
    Song, Mengfei
    Jin, Linbo
    Jiang, Wen
    Wang, Xin
    Ning, Wei
    Yu, Shanqing
    Xuan, Qi
    Cai, Xiaoyan
    Yang, Libin
    KNOWLEDGE-BASED SYSTEMS, 2024, 299
  • [5] Instruction Tuning-Free Visual Token Complement for Multimodal LLMs
    Wang, Dongsheng
    Cui, Jiequan
    Li, Miaoge
    Lin, Wang
    Chen, Bo
    Zhang, Hanwang
    COMPUTER VISION - ECCV 2024, PT LXXXI, 2025, 15139 : 446 - 462
  • [6] Multi-LoRA continual learning based instruction tuning framework for universal information extraction
    Jin, Yu
    Liu, Jie
    Chen, Shaowei
    KNOWLEDGE-BASED SYSTEMS, 2025, 308
  • [7] Chain-of-LoRA: Enhancing the Instruction Fine-Tuning Performance of Low-Rank Adaptation on Diverse Instruction Set
    Qiu, Xihe
    Hao, Teqi
    Shi, Shaojie
    Tan, Xiaoyu
    Xiong, Yu-Jie
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 875 - 879
  • [8] MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis
    Zheng, Jianbin
    Liu, Daqing
    Wang, Chaoyue
    Hu, Minghui
    Yang, Zuopeng
    Ding, Changxing
    Tao, Dacheng
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (09) : 3537 - 3565
  • [9] MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis
    Zheng, Jianbin
    Liu, Daqing
    Wang, Chaoyue
    Hu, Minghui
    Yang, Zuopeng
    Ding, Changxing
    Tao, Dacheng
    arXiv, 2023,
  • [10] Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
    Lee, Young-Suk
    Sultan, Md Arafat
    El-Kurdi, Yousef
    Munawar, Tahira Naseem Asim
    Florian, Radu
    Roukos, Salim
    Astudillo, Ramon Fernandez
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 12561 - 12571