EasyGen: Easing Multimodal Generation with BiDiffuser and LLMs

被引:0
|
作者
Zhao, Xiangyu [1 ]
Liu, Bo [1 ]
Liu, Qijiong [1 ]
Shi, Guangyuan [1 ]
Wu, Xiao-Ming [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present EasyGen, an efficient model designed to enhance multimodal understanding and generation by harnessing the capabilities of diffusion models and large language models (LLMs). Unlike existing multimodal models that predominately depend on encoders like CLIP or ImageBind and need ample amounts of training data to bridge modalities, EasyGen leverages BiDiffuser, a bidirectional conditional diffusion model, to foster more efficient modality interactions. EasyGen achieves text generation by training a projection layer linking BiDiffuser and an LLM, and facilities image generation by training an adapter to align the LLM's text space with the BiDiffuser's image space. Comprehensive quantitative and qualitative experiments show that EasyGen excels in data-efficient training, high-quality image generation, and extendibility, effectively addressing the challenges in multimodal generation. The source code is available at https: //github.com/zxy556677/EasyGen.
引用
收藏
页码:1351 / 1370
页数:20
相关论文
共 50 条
  • [31] Computing Architecture for Large-Language Models (LLMs) and Large Multimodal Models (LMMs)
    Liang, Bor-Sung
    PROCEEDINGS OF THE 2024 INTERNATIONAL SYMPOSIUM ON PHYSICAL DESIGN, ISPD 2024, 2024, : 233 - 234
  • [32] HiA: Towards Chinese Multimodal LLMs for Comparative High-Resolution Joint Diagnosis
    Ding, Xinpeng
    Chu, Yongqiang
    Pi, Renjie
    Wang, Hualiang
    Li, Xiaomeng
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT XII, 2024, 15012 : 575 - 586
  • [33] Eyes Closed, Safety on: Protecting Multimodal LLMs via Image-to-Text Transformation
    Gou, Yunhao
    Chen, Kai
    Liu, Zhili
    Hong, Lanqing
    Xu, Hang
    Li, Zhenguo
    Yeung, Dit-Yan
    Kwok, James T.
    Zhang, Yu
    COMPUTER VISION - ECCV 2024, PT XVII, 2025, 15075 : 388 - 404
  • [34] Inclusive Counterfactual Generation: Leveraging LLMs in Identifying Online Hate
    Qureshi, M. Atif
    Younus, Arjumand
    Caton, Simon
    WEB ENGINEERING, ICWE 2024, 2024, 14629 : 34 - 48
  • [35] Easing the way to installing the next generation optical transmission platforms
    Quarton, Robert
    Elektron, 2004, 21 (01): : 34 - 36
  • [36] Evaluation of Orca 2 Against Other LLMs for Retrieval Augmented Generation
    Huang, Donghao
    Wang, Zhaoxia
    TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2024 WORKSHOPS, RAFDA AND IWTA, 2024, 14658 : 3 - 19
  • [37] AutoBench: Automatic Testbench Generation and Evaluation Using LLMs for HDL Design
    Qiu, Ruidi
    Zhang, Grace Li
    Drechsler, Rolf
    Schlichtmann, Ulf
    Li, Bing
    2024 ACM/IEEE 6TH SYMPOSIUM ON MACHINE LEARNING FOR CAD, MLCAD 2024, 2024,
  • [38] Learning Preference Model for LLMs via Automatic Preference Data Generation
    Huang, Shijia
    Zhao, Jianqiao
    Li, Yanyang
    Wang, Liwei
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 9187 - 9199
  • [39] LimSim plus plus : A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving
    Fu, Daocheng
    Lei, Wenjie
    Wen, Licheng
    Cai, Pinlong
    Mao, Song
    Dou, Min
    Shi, Botian
    Qiao, Yu
    2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 1084 - 1090
  • [40] A Retrospective on Whole Test Suite Generation: On the Role of SBST in the Age of LLMs
    Fraser, Gordon
    Arcuri, Andrea
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2025, 51 (03) : 874 - 878