EasyGen: Easing Multimodal Generation with BiDiffuser and LLMs

被引：0

作者：

Zhao, Xiangyu ^{[1
]}

Liu, Bo ^{[1
]}

Liu, Qijiong ^{[1
]}

Shi, Guangyuan ^{[1
]}

Wu, Xiao-Ming ^{[1
]}

机构：

[1] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China

来源：

PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present EasyGen, an efficient model designed to enhance multimodal understanding and generation by harnessing the capabilities of diffusion models and large language models (LLMs). Unlike existing multimodal models that predominately depend on encoders like CLIP or ImageBind and need ample amounts of training data to bridge modalities, EasyGen leverages BiDiffuser, a bidirectional conditional diffusion model, to foster more efficient modality interactions. EasyGen achieves text generation by training a projection layer linking BiDiffuser and an LLM, and facilities image generation by training an adapter to align the LLM's text space with the BiDiffuser's image space. Comprehensive quantitative and qualitative experiments show that EasyGen excels in data-efficient training, high-quality image generation, and extendibility, effectively addressing the challenges in multimodal generation. The source code is available at https: //github.com/zxy556677/EasyGen.

引用

页码：1351 / 1370

页数：20

共 50 条

[31] Computing Architecture for Large-Language Models (LLMs) and Large Multimodal Models (LMMs)
Liang, Bor-Sung
PROCEEDINGS OF THE 2024 INTERNATIONAL SYMPOSIUM ON PHYSICAL DESIGN, ISPD 2024, 2024, : 233 - 234
[32] HiA: Towards Chinese Multimodal LLMs for Comparative High-Resolution Joint Diagnosis
Ding, Xinpeng
Chu, Yongqiang
Pi, Renjie
Wang, Hualiang
Li, Xiaomeng
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT XII, 2024, 15012 : 575 - 586
[33] Eyes Closed, Safety on: Protecting Multimodal LLMs via Image-to-Text Transformation
Gou, Yunhao
Chen, Kai
Liu, Zhili
Hong, Lanqing
Xu, Hang
Li, Zhenguo
Yeung, Dit-Yan
Kwok, James T.
Zhang, Yu
COMPUTER VISION - ECCV 2024, PT XVII, 2025, 15075 : 388 - 404
[34] Inclusive Counterfactual Generation: Leveraging LLMs in Identifying Online Hate
Qureshi, M. Atif
Younus, Arjumand
Caton, Simon
WEB ENGINEERING, ICWE 2024, 2024, 14629 : 34 - 48
[35] Easing the way to installing the next generation optical transmission platforms
Quarton, Robert
Elektron, 2004, 21 (01): : 34 - 36
[36] Evaluation of Orca 2 Against Other LLMs for Retrieval Augmented Generation
Huang, Donghao
Wang, Zhaoxia
TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2024 WORKSHOPS, RAFDA AND IWTA, 2024, 14658 : 3 - 19
[37] AutoBench: Automatic Testbench Generation and Evaluation Using LLMs for HDL Design
Qiu, Ruidi
Zhang, Grace Li
Drechsler, Rolf
Schlichtmann, Ulf
Li, Bing
2024 ACM/IEEE 6TH SYMPOSIUM ON MACHINE LEARNING FOR CAD, MLCAD 2024, 2024,
[38] Learning Preference Model for LLMs via Automatic Preference Data Generation
Huang, Shijia
Zhao, Jianqiao
Li, Yanyang
Wang, Liwei
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 9187 - 9199
[39] LimSim plus plus : A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving
Fu, Daocheng
Lei, Wenjie
Wen, Licheng
Cai, Pinlong
Mao, Song
Dou, Min
Shi, Botian
Qiao, Yu
2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 1084 - 1090
[40] A Retrospective on Whole Test Suite Generation: On the Role of SBST in the Age of LLMs
Fraser, Gordon
Arcuri, Andrea
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2025, 51 (03) : 874 - 878

← 1 2 3 4 5 →