EasyGen: Easing Multimodal Generation with BiDiffuser and LLMs

被引：0

作者：

Zhao, Xiangyu ^{[1
]}

Liu, Bo ^{[1
]}

Liu, Qijiong ^{[1
]}

Shi, Guangyuan ^{[1
]}

Wu, Xiao-Ming ^{[1
]}

机构：

[1] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China

来源：

PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present EasyGen, an efficient model designed to enhance multimodal understanding and generation by harnessing the capabilities of diffusion models and large language models (LLMs). Unlike existing multimodal models that predominately depend on encoders like CLIP or ImageBind and need ample amounts of training data to bridge modalities, EasyGen leverages BiDiffuser, a bidirectional conditional diffusion model, to foster more efficient modality interactions. EasyGen achieves text generation by training a projection layer linking BiDiffuser and an LLM, and facilities image generation by training an adapter to align the LLM's text space with the BiDiffuser's image space. Comprehensive quantitative and qualitative experiments show that EasyGen excels in data-efficient training, high-quality image generation, and extendibility, effectively addressing the challenges in multimodal generation. The source code is available at https: //github.com/zxy556677/EasyGen.

引用

页码：1351 / 1370

页数：20

共 50 条

[21] Multimodal LLMs Struggle with Basic Visual Network Analysis: A VNA Benchmark
Williams, Evan M.
Carley, Kathleen M.
SOCIAL, CULTURAL, AND BEHAVIORAL MODELING, SBP-BRIMS 2024, 2024, 14972 : 15 - 24
[22] Instruction Tuning-Free Visual Token Complement for Multimodal LLMs
Wang, Dongsheng
Cui, Jiequan
Li, Miaoge
Lin, Wang
Chen, Bo
Zhang, Hanwang
COMPUTER VISION - ECCV 2024, PT LXXXI, 2025, 15139 : 446 - 462
[23] Towards Efficient DataWrangling with LLMs using Code Generation
Li, Xue
Dohmen, Till
PROCEEDINGS OF THE 8TH WORKSHOP ON DATA MANAGEMENT FOR END-TO-END MACHINE LEARNING, DEEM 2024, 2024,
[24] Repair Is Nearly Generation: Multilingual Program Repair with LLMs
Joshi, Harshit
Sanchez, Jose Cambronero
Gulwani, Sumit
Le, Vu
Radicek, Ivan
Verbruggen, Gust
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 4, 2023, : 5131 - 5140
[25] LLMs for science: Usage for code generation and data analysis
Nejjar, Mohamed
Zacharias, Luca
Stiehle, Fabian
Weber, Ingo
JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2025, 37 (01)
[26] Generating Multimodal Augmentations with LLMs from Song Metadata for Music Information Retrieval
Rossetto, Federico
Dalton, Jeffrey
Murray-Smith, Roderick
PROCEEDINGS OF THE 1ST WORKSHOP ON LARGE GENERATIVE MODELS MEET MULTIMODAL APPLICATIONS, LGM3A 2023, 2023, : 51 - 59
[27] Retrieval Augmented Generation with LLMs for Explaining Business Process Models
Minor, Mirjam
Kaucher, Eduard
CASE-BASED REASONING RESEARCH AND DEVELOPMENT, ICCBR 2024, 2024, 14775 : 175 - 190
[28] TelecomRAG: Taming Telecom Standards with Retrieval Augmented Generation and LLMs
Yilma, Girma M.
Ayala-Romero, Jose A.
Garcia-Saavedra, Andres
Costa-Perez, Xavier
ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2024, 54 (03) : 18 - 23
[29] Evaluating the Smooth Control of Attribute Intensity in Text Generation with LLMs
Zhou, Shang
Yao, Feng
Dong, Chengyu
Wang, Zihan
Shang, Jingbo
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 4348 - 4362
[30] Model Generation with LLMs: From Requirements to UML Sequence Diagrams
Ferrari, Alessio
Abualhaija, Sallam
Arora, Chetan
32ND INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE WORKSHOPS, REW 2024, 2024, : 291 - 300

← 1 2 3 4 5 →