EasyGen: Easing Multimodal Generation with BiDiffuser and LLMs

被引:0
|
作者
Zhao, Xiangyu [1 ]
Liu, Bo [1 ]
Liu, Qijiong [1 ]
Shi, Guangyuan [1 ]
Wu, Xiao-Ming [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present EasyGen, an efficient model designed to enhance multimodal understanding and generation by harnessing the capabilities of diffusion models and large language models (LLMs). Unlike existing multimodal models that predominately depend on encoders like CLIP or ImageBind and need ample amounts of training data to bridge modalities, EasyGen leverages BiDiffuser, a bidirectional conditional diffusion model, to foster more efficient modality interactions. EasyGen achieves text generation by training a projection layer linking BiDiffuser and an LLM, and facilities image generation by training an adapter to align the LLM's text space with the BiDiffuser's image space. Comprehensive quantitative and qualitative experiments show that EasyGen excels in data-efficient training, high-quality image generation, and extendibility, effectively addressing the challenges in multimodal generation. The source code is available at https: //github.com/zxy556677/EasyGen.
引用
收藏
页码:1351 / 1370
页数:20
相关论文
共 50 条
  • [41] OmniActions: Predicting Digital Actions in Response to Real-World Multimodal Sensory Inputs with LLMs
    Li, Jiahao Nick
    Xu, Yan
    Grossman, Tovi
    Santosa, Stephanie
    Li, Michelle
    PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS, CHI 2024, 2024,
  • [42] Finetuning LLMs for Automatic Concept to TTI Prompt Generation (Student Abstract)
    Rutter, Jeremy
    Chamakura, Maneesh Reddy
    Delgado, Justin
    Kim, Gene Louis
    THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23637 - 23639
  • [43] On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey
    Long, Lin
    Wang, Rui
    Xiao, Ruixuan
    Zhao, Junbo
    Ding, Xiao
    Chen, Gang
    Wang, Haobo
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 11065 - 11082
  • [44] Empowering Education with LLMs - The Next-Gen Interface and Content Generation
    Moore, Steven
    Tong, Richard
    Singh, Anjali
    Liu, Zitao
    Hu, Xiangen
    Lu, Yu
    Liang, Joleen
    Cao, Chen
    Khosravi, Hassan
    Denny, Paul
    Brooks, Chris
    Stamper, John
    ARTIFICIAL INTELLIGENCE IN EDUCATION. POSTERS AND LATE BREAKING RESULTS, WORKSHOPS AND TUTORIALS, INDUSTRY AND INNOVATION TRACKS, PRACTITIONERS, DOCTORAL CONSORTIUM AND BLUE SKY, AIED 2023, 2023, 1831 : 32 - 37
  • [45] AutoBench: Automatic Testbench Generation and Evaluation Using LLMs for HDL Design
    Qiu, Ruidi
    Li Zhang, Grace
    Drechsler, Rolf
    Schlichtmann, Ulf
    Li, Bing
    PROCEEDINGS OF THE 2024 ACM/IEEE INTERNATIONAL SYMPOSIUM ON MACHINE LEARNING FOR CAD, MLCAD 2024, 2024,
  • [46] Exploring the Effectiveness of LLMs in Automated Logging Statement Generation: An Empirical Study
    Li, Yichen
    Huo, Yintong
    Jiang, Zhihan
    Zhong, Renyi
    He, Pinjia
    Su, Yuxin
    Briand, Lionel C.
    Lyu, Michael R.
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (12) : 3188 - 3207
  • [47] A New Method Using LLMs for Keypoints Generation in Qualitative Data Analysis
    Zhao, Fengxiang
    Yu, Fan
    Trull, Timothy
    Shang, Yi
    2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 333 - 334
  • [48] Guiding LLMs The RightWay: Fast, Non-Invasive Constrained Generation
    Beurer-Kellner, Luca
    Fischer, Marc
    Vechev, Martin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, 2024, 235
  • [49] Refining LLMs with Reinforcement Learning for Human-Like Text Generation
    Harish, Aditya
    Prakash, Gaurav
    Nair, Ronith R.
    Iyer, Varun Bhaskaran
    Kumar, Anand M.
    10TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTING AND COMMUNICATION TECHNOLOGIES, CONECCT 2024, 2024,
  • [50] Query Generation for Multimodal Documents
    Kim, Kyungho
    Lee, Kyungjae
    Hwang, Seung-won
    Song, Young-In
    Lee, Seungwook
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 659 - 668