Instruction Tuning-Free Visual Token Complement for Multimodal LLMs

被引:0
|
作者
Wang, Dongsheng [1 ]
Cui, Jiequan [2 ]
Li, Miaoge [3 ]
Lin, Wang [4 ]
Chen, Bo [5 ]
Zhang, Hanwang [2 ]
机构
[1] Shenzhen Univ, Shenzhen 518052, Peoples R China
[2] Nanyang Technol Univ, 50 Nanyang Ave, Singapore 639798, Singapore
[3] Hong Kong Polytech Univ, Hung Hom, Kowloon, Hong Kong, Peoples R China
[4] Zhejiang Univ, Hangzhou 310058, Peoples R China
[5] Xidian Univ, Xian 710126, Shaanxi, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
D O I
10.1007/978-3-031-73004-7_26
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As the open community of large language models (LLMs) matures, multimodal LLMs (MLLMs) have promised an elegant bridge between vision and language. However, current research is inherently constrained by challenges such as the need for high-quality instruction pairs and the loss of visual information in image-to-text training objectives. To this end, we propose a Visual Token Complement framework (VTC) that helps MLLMs regain the missing visual features and thus improve response accuracy. Specifically, our VTC integrates text-to-image generation as a guide to identifying the text-irrelevant features, and a visual selector is then developed to generate complementary visual tokens to enrich the original visual input. Moreover, an iterative strategy is further designed to extract more visual information by iteratively using the visual selector without any additional training. Notably, the training pipeline requires no additional image-text pairs, resulting in a desired instruction tuning-free property. Both qualitative and quantitative experiments demonstrate the superiority and efficiency of our VTC.
引用
收藏
页码:446 / 462
页数:17
相关论文
共 50 条
  • [1] Instruction Tuning with LLMs for Programming Exercise Generation
    Zeng, Guolong
    Xue, Qinchen
    Lu, Xuesong
    WEB INFORMATION SYSTEMS AND APPLICATIONS, WISA 2024, 2024, 14883 : 377 - 385
  • [2] A NEW PRINCIPLE FOR TUNING-FREE HUBER REGRESSION
    Wang, Lili
    Zheng, Chao
    Zhou, Wen
    Zhou, Wen-Xin
    STATISTICA SINICA, 2021, 31 (04) : 2153 - 2177
  • [3] Robust convex biclustering with a tuning-free method
    Chen, Yifan
    Lei, Chunyin
    Li, Chuanquan
    Ma, Haiqiang
    Hu, Ningyuan
    JOURNAL OF APPLIED STATISTICS, 2025, 52 (02) : 271 - 286
  • [4] Tuning-Free Generalized Hamiltonian Monte Carlo
    Hoffman, Matthew D.
    Sountsov, Pavel
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [5] Tuning-Free Heterogeneous Inference in Massive Networks
    Ren, Zhao
    Kang, Yongjian
    Fan, Yingying
    Lv, Jinchi
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2019, 114 (528) : 1908 - 1925
  • [6] TUNING-FREE STEP-SIZE ADAPTATION
    Mahmood, Ashique Rupam
    Sutton, Richard S.
    Degris, Thomas
    Pilarski, Patrick M.
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 2121 - 2124
  • [7] Visual Instruction Tuning
    Liu, Haotian
    Li, Chunyuan
    Wu, Qingyang
    Lee, Yong Jae
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [8] Reinforcement Learning for Efficient and Tuning-Free Link Adaptation
    Saxena, Vidit
    Tullberg, Hugo
    Jalden, Joakim
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2022, 21 (02) : 768 - 780
  • [9] ON THE TUNING-FREE STATISTICAL MODEL OF OCEAN SURFACE WAVES
    Zakharov, Vladimir
    Resio, Donald
    Pushkarev, Andrei
    PROCEEDINGS OF THE ASME 37TH INTERNATIONAL CONFERENCE ON OCEAN, OFFSHORE AND ARCTIC ENGINEERING, 2018, VOL 3, 2018,
  • [10] Tuning-Free Image Customization with Image and Text Guidance
    Li, Pengzhi
    Nie, Qiang
    Chen, Ying
    Jiang, Xi
    Wu, Kai
    Lin, Yuhuan
    Liu, Yong
    Peng, Jinlong
    Wang, Chengjie
    Zheng, Feng
    COMPUTER VISION - ECCV 2024, PT LXXVI, 2025, 15134 : 233 - 250