MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices

被引：0

作者：

Zhao, Yang ^{[1
]}

Xu, Yanwu ^{[1
]}

Xiao, Zhisheng ^{[1
]}

Jia, Haolin ^{[1
]}

Hou, Tingbo ^{[2
]}

机构：

[1] Google, New York, NY USA

[2] Meta GenAI, New York, NY USA

来源：

COMPUTER VISION - ECCV 2024, PT LXII | 2025年 / 15120卷

关键词：

D O I：

10.1007/978-3-031-73033-7_13

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The deployment of large-scale text-to-image diffusion models on mobile devices is impeded by their substantial model size and high latency. In this paper, we present MobileDiffusion, an ultra-efficient text-to-image diffusion model obtained through extensive optimizations in both architecture and sampling techniques. We conduct a comprehensive examination of model architecture design to minimize model size and FLOPs, while preserving image generation quality. Additionally, we revisit the advanced sampling technique by diffusion-GAN, and make one-step sampling compatible to downstream applications trained on the base model. Empirical studies, conducted both quantitatively and qualitatively, demonstrate the effectiveness of our proposed technologies. With them, MobileDiffusion achieves instant text-to-image generation on mobile devices, establishing a new state of the art.

引用

页码：225 / 242

页数：18

共 50 条

[21] Text-to-Image Generation Method Based on Image-Text Semantic Consistency
Xue Z.
Xu Z.
Lang C.
Feng S.
Wang T.
Li Y.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (09): : 2180 - 2190
[22] Generative adversarial text-to-image generation with style image constraint
Zekang Wang
Li Liu
Huaxiang Zhang
Dongmei Liu
Yu Song
Multimedia Systems, 2023, 29 : 3291 - 3303
[23] Generative adversarial text-to-image generation with style image constraint
Wang, Zekang
Liu, Li
Zhang, Huaxiang
Liu, Dongmei
Song, Yu
MULTIMEDIA SYSTEMS, 2023, 29 (06) : 3291 - 3303
[24] Improving text-to-image generation with object layout guidance
Jezia Zakraoui
Moutaz Saleh
Somaya Al-Maadeed
Jihad Mohammed Jaam
Multimedia Tools and Applications, 2021, 80 : 27423 - 27443
[25] Variational Distribution Learning for Unsupervised Text-to-Image Generation
Kang, Minsoo
Lee, Doyup
Kim, Jiseob
Kim, Saehoon
Han, Bohyung
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23380 - 23389
[26] HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances
Narasimhaswamy, Supreeth
Bhattacharya, Uttaran
Chen, Xiang
Dasgupta, Ishita
Mitra, Saayan
Hoai, Minh
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 2468 - 2479
[27] Attribute-Centric Compositional Text-to-Image Generation
Cong, Yuren
Min, Martin Renqiang
Li, Li Erran
Rosenhahn, Bodo
Yang, Michael Ying
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025,
[28] Using text-to-image generation for architectural design ideation
Paananen, Ville
Oppenlaender, Jonas
Visuri, Aku
INTERNATIONAL JOURNAL OF ARCHITECTURAL COMPUTING, 2024, 22 (03) : 458 - 474
[29] No-reference Quality Assessment of Text-to-Image Generation
Huang, Haitao
Jia, Rongli
Zhang, Yuhong
Xie, Rong
Song, Li
Li, Lin
Feng, Yanan
19TH IEEE INTERNATIONAL SYMPOSIUM ON BROADBAND MULTIMEDIA SYSTEMS AND BROADCASTING, BMSB 2024, 2024, : 357 - 362
[30] CogView: Mastering Text-to-Image Generation via Transformers
Ding, Ming
Yang, Zhuoyi
Hong, Wenyi
Zheng, Wendi
Zhou, Chang
Yin, Da
Lin, Junyang
Zou, Xu
Shao, Zhou
Yang, Hongxia
Tang, Jie
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34

← 1 2 3 4 5 →