EasyGen: Easing Multimodal Generation with BiDiffuser and LLMs

被引:0
|
作者
Zhao, Xiangyu [1 ]
Liu, Bo [1 ]
Liu, Qijiong [1 ]
Shi, Guangyuan [1 ]
Wu, Xiao-Ming [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present EasyGen, an efficient model designed to enhance multimodal understanding and generation by harnessing the capabilities of diffusion models and large language models (LLMs). Unlike existing multimodal models that predominately depend on encoders like CLIP or ImageBind and need ample amounts of training data to bridge modalities, EasyGen leverages BiDiffuser, a bidirectional conditional diffusion model, to foster more efficient modality interactions. EasyGen achieves text generation by training a projection layer linking BiDiffuser and an LLM, and facilities image generation by training an adapter to align the LLM's text space with the BiDiffuser's image space. Comprehensive quantitative and qualitative experiments show that EasyGen excels in data-efficient training, high-quality image generation, and extendibility, effectively addressing the challenges in multimodal generation. The source code is available at https: //github.com/zxy556677/EasyGen.
引用
收藏
页码:1351 / 1370
页数:20
相关论文
共 50 条
  • [21] Multimodal LLMs Struggle with Basic Visual Network Analysis: A VNA Benchmark
    Williams, Evan M.
    Carley, Kathleen M.
    SOCIAL, CULTURAL, AND BEHAVIORAL MODELING, SBP-BRIMS 2024, 2024, 14972 : 15 - 24
  • [22] Instruction Tuning-Free Visual Token Complement for Multimodal LLMs
    Wang, Dongsheng
    Cui, Jiequan
    Li, Miaoge
    Lin, Wang
    Chen, Bo
    Zhang, Hanwang
    COMPUTER VISION - ECCV 2024, PT LXXXI, 2025, 15139 : 446 - 462
  • [23] Towards Efficient DataWrangling with LLMs using Code Generation
    Li, Xue
    Dohmen, Till
    PROCEEDINGS OF THE 8TH WORKSHOP ON DATA MANAGEMENT FOR END-TO-END MACHINE LEARNING, DEEM 2024, 2024,
  • [24] Repair Is Nearly Generation: Multilingual Program Repair with LLMs
    Joshi, Harshit
    Sanchez, Jose Cambronero
    Gulwani, Sumit
    Le, Vu
    Radicek, Ivan
    Verbruggen, Gust
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 4, 2023, : 5131 - 5140
  • [25] LLMs for science: Usage for code generation and data analysis
    Nejjar, Mohamed
    Zacharias, Luca
    Stiehle, Fabian
    Weber, Ingo
    JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2025, 37 (01)
  • [26] Generating Multimodal Augmentations with LLMs from Song Metadata for Music Information Retrieval
    Rossetto, Federico
    Dalton, Jeffrey
    Murray-Smith, Roderick
    PROCEEDINGS OF THE 1ST WORKSHOP ON LARGE GENERATIVE MODELS MEET MULTIMODAL APPLICATIONS, LGM3A 2023, 2023, : 51 - 59
  • [27] Retrieval Augmented Generation with LLMs for Explaining Business Process Models
    Minor, Mirjam
    Kaucher, Eduard
    CASE-BASED REASONING RESEARCH AND DEVELOPMENT, ICCBR 2024, 2024, 14775 : 175 - 190
  • [28] TelecomRAG: Taming Telecom Standards with Retrieval Augmented Generation and LLMs
    Yilma, Girma M.
    Ayala-Romero, Jose A.
    Garcia-Saavedra, Andres
    Costa-Perez, Xavier
    ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2024, 54 (03) : 18 - 23
  • [29] Evaluating the Smooth Control of Attribute Intensity in Text Generation with LLMs
    Zhou, Shang
    Yao, Feng
    Dong, Chengyu
    Wang, Zihan
    Shang, Jingbo
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 4348 - 4362
  • [30] Model Generation with LLMs: From Requirements to UML Sequence Diagrams
    Ferrari, Alessio
    Abualhaija, Sallam
    Arora, Chetan
    32ND INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE WORKSHOPS, REW 2024, 2024, : 291 - 300