EasyGen: Easing Multimodal Generation with BiDiffuser and LLMs

被引：0

作者：

Zhao, Xiangyu ^{[1
]}

Liu, Bo ^{[1
]}

Liu, Qijiong ^{[1
]}

Shi, Guangyuan ^{[1
]}

Wu, Xiao-Ming ^{[1
]}

机构：

[1] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China

来源：

PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present EasyGen, an efficient model designed to enhance multimodal understanding and generation by harnessing the capabilities of diffusion models and large language models (LLMs). Unlike existing multimodal models that predominately depend on encoders like CLIP or ImageBind and need ample amounts of training data to bridge modalities, EasyGen leverages BiDiffuser, a bidirectional conditional diffusion model, to foster more efficient modality interactions. EasyGen achieves text generation by training a projection layer linking BiDiffuser and an LLM, and facilities image generation by training an adapter to align the LLM's text space with the BiDiffuser's image space. Comprehensive quantitative and qualitative experiments show that EasyGen excels in data-efficient training, high-quality image generation, and extendibility, effectively addressing the challenges in multimodal generation. The source code is available at https: //github.com/zxy556677/EasyGen.

引用

页码：1351 / 1370

页数：20

共 50 条

[41] OmniActions: Predicting Digital Actions in Response to Real-World Multimodal Sensory Inputs with LLMs
Li, Jiahao Nick
Xu, Yan
Grossman, Tovi
Santosa, Stephanie
Li, Michelle
PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS, CHI 2024, 2024,
[42] Finetuning LLMs for Automatic Concept to TTI Prompt Generation (Student Abstract)
Rutter, Jeremy
Chamakura, Maneesh Reddy
Delgado, Justin
Kim, Gene Louis
THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23637 - 23639
[43] On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey
Long, Lin
Wang, Rui
Xiao, Ruixuan
Zhao, Junbo
Ding, Xiao
Chen, Gang
Wang, Haobo
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 11065 - 11082
[44] Empowering Education with LLMs - The Next-Gen Interface and Content Generation
Moore, Steven
Tong, Richard
Singh, Anjali
Liu, Zitao
Hu, Xiangen
Lu, Yu
Liang, Joleen
Cao, Chen
Khosravi, Hassan
Denny, Paul
Brooks, Chris
Stamper, John
ARTIFICIAL INTELLIGENCE IN EDUCATION. POSTERS AND LATE BREAKING RESULTS, WORKSHOPS AND TUTORIALS, INDUSTRY AND INNOVATION TRACKS, PRACTITIONERS, DOCTORAL CONSORTIUM AND BLUE SKY, AIED 2023, 2023, 1831 : 32 - 37
[45] AutoBench: Automatic Testbench Generation and Evaluation Using LLMs for HDL Design
Qiu, Ruidi
Li Zhang, Grace
Drechsler, Rolf
Schlichtmann, Ulf
Li, Bing
PROCEEDINGS OF THE 2024 ACM/IEEE INTERNATIONAL SYMPOSIUM ON MACHINE LEARNING FOR CAD, MLCAD 2024, 2024,
[46] Exploring the Effectiveness of LLMs in Automated Logging Statement Generation: An Empirical Study
Li, Yichen
Huo, Yintong
Jiang, Zhihan
Zhong, Renyi
He, Pinjia
Su, Yuxin
Briand, Lionel C.
Lyu, Michael R.
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (12) : 3188 - 3207
[47] A New Method Using LLMs for Keypoints Generation in Qualitative Data Analysis
Zhao, Fengxiang
Yu, Fan
Trull, Timothy
Shang, Yi
2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 333 - 334
[48] Guiding LLMs The RightWay: Fast, Non-Invasive Constrained Generation
Beurer-Kellner, Luca
Fischer, Marc
Vechev, Martin
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, 2024, 235
[49] Refining LLMs with Reinforcement Learning for Human-Like Text Generation
Harish, Aditya
Prakash, Gaurav
Nair, Ronith R.
Iyer, Varun Bhaskaran
Kumar, Anand M.
10TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTING AND COMMUNICATION TECHNOLOGIES, CONECCT 2024, 2024,
[50] Query Generation for Multimodal Documents
Kim, Kyungho
Lee, Kyungjae
Hwang, Seung-won
Song, Young-In
Lee, Seungwook
16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 659 - 668

← 1 2 3 4 5 →