MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation

被引：0

作者：

Bellagente, Marco ^{[4
]}

Brack, Manuel ^{[2
,3
]}

Teufel, Hannah ^{[1
]}

Friedrich, Felix ^{[3
,6
]}

Deiseroth, Bjoern ^{[1
,3
,6
]}

Eichenberg, Constantin ^{[1
]}

Dai, Andrew ^{[1
]}

Baldock, Robert J. N. ^{[1
]}

Nanda, Souradeep ^{[5
]}

Oostermeijer, Koen ^{[1
]}

Cruz-Salinas, Andres Felipe ^{[1
]}

Schramowski, Patrick ^{[2
,3
,6
,8
]}

Kersting, Kristian ^{[2
,3
,6
,7
]}

Weinbach, Samuel ^{[1
]}

机构：

[1] Aleph Alpha, Heidelberg, Germany

[2] German Res Ctr Artificial Intelligence DFKI, Kaiserslautern, Germany

[3] Tech Univ Darmstadt, Comp Sci Dept, Darmstadt, Germany

[4] Stabil AI, London, England

[5] Univ Texas Dallas, Dallas, TX USA

[6] Hessian AI, Darmstadt, Germany

[7] Tech Univ Darmstadt, Ctr Cognit Sci, Darmstadt, Germany

[8] LAION, Hamburg, Germany

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

基金：

欧盟地平线“2020”;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The recent popularity of text-to-image diffusion models (DM) can largely be attributed to the intuitive interface they provide to users. The intended generation can be expressed in natural language, with the model producing faithful interpretations of text prompts. However, expressing complex or nuanced ideas in text alone can be difficult. To ease image generation, we propose MULTIFUSION that allows one to express complex and nuanced concepts with arbitrarily interleaved inputs of multiple modalities and languages. MULTIFUSION leverages pre-trained models and aligns them for integration into a cohesive system, thereby avoiding the need for extensive training from scratch. Our experimental results demonstrate the efficient transfer of capabilities from individual modules to the downstream model. Specifically, the fusion of all independent components allows the image generation module to utilize multilingual, interleaved multimodal inputs despite being trained solely on monomodal data in a single language.

引用

页数：20

共 50 条

[41] MtArtGPT: A Multi-Task Art Generation System With Pre-Trained Transformer
Jin, Cong
Zhu, Ruolin
Zhu, Zixing
Yang, Lu
Yang, Min
Luo, Jiebo
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 6901 - 6912
[42] Fusing BO and LiDAR for SAR Image Translation with Multi-Modal Generative Adversarial Networks
Zhu, Jiang
Qing, Yuanyuan
Lin, Zhiping
Wen, Kilian
2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
[43] TraVL: Transferring Pre-trained Visual-Linguistic Models for Cross-Lingual Image Captioning
Zhang, Zhebin
Lu, Peng
Jiang, Dawei
Chen, Gang
WEB AND BIG DATA, PT II, APWEB-WAIM 2022, 2023, 13422 : 341 - 355
[44] Cross-Modal Retrieval Algorithm for Image and Text Based on Pre-Trained Models and Encoders
Chen X.
Peng J.
Zhang P.
Luo Z.
Ou Z.
Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2023, 46 (05): : 112 - 117
[45] Hybrid multi-document summarization using pre-trained language models
Ghadimi, Alireza
Beigy, Hamid
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 192
[46] TED TALK TEASER GENERATION WITH PRE-TRAINED MODELS
Vico, Gianluca
Niehues, Jan
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8067 - 8071
[47] MaxFusion: Plug&Play Multi-modal Generation in Text-to-Image Diffusion Models
Nair, Nithin Gopalakrishnan
Valanarasu, Jeya Maria Jose
Patel, Vishal M.
COMPUTER VISION-ECCV 2024, PT XXXVIII, 2025, 15096 : 93 - 110
[48] Pre-Trained Language Models for Text Generation: A Survey
Li, Junyi
Tang, Tianyi
Zhao, Wayne Xin
Nie, Jian-Yun
Wen, Ji-Rong
ACM COMPUTING SURVEYS, 2024, 56 (09)
[49] Leveraging pre-trained language models for code generation
Soliman, Ahmed
Shaheen, Samir
Hadhoud, Mayada
COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (03) : 3955 - 3980
[50] Multi-modal lung ultrasound image classification by fusing image-based features and probe information
Okolo, Gabriel Iluebe
Katsigiannis, Stamos
Ramzan, Naeem
2022 IEEE 22ND INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE 2022), 2022, : 45 - 50

← 1 2 3 4 5 →