MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices

被引:0
|
作者
Zhao, Yang [1 ]
Xu, Yanwu [1 ]
Xiao, Zhisheng [1 ]
Jia, Haolin [1 ]
Hou, Tingbo [2 ]
机构
[1] Google, New York, NY USA
[2] Meta GenAI, New York, NY USA
来源
关键词
D O I
10.1007/978-3-031-73033-7_13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The deployment of large-scale text-to-image diffusion models on mobile devices is impeded by their substantial model size and high latency. In this paper, we present MobileDiffusion, an ultra-efficient text-to-image diffusion model obtained through extensive optimizations in both architecture and sampling techniques. We conduct a comprehensive examination of model architecture design to minimize model size and FLOPs, while preserving image generation quality. Additionally, we revisit the advanced sampling technique by diffusion-GAN, and make one-step sampling compatible to downstream applications trained on the base model. Empirical studies, conducted both quantitatively and qualitatively, demonstrate the effectiveness of our proposed technologies. With them, MobileDiffusion achieves instant text-to-image generation on mobile devices, establishing a new state of the art.
引用
收藏
页码:225 / 242
页数:18
相关论文
共 50 条
  • [1] DDIMCACHE: AN ENHANCED TEXT-TO-IMAGE DIFFUSION MODEL ON MOBILE DEVICES
    Wu, Qifeng
    KYBERNETIKA, 2024, 60 (06) : 819 - 833
  • [2] Controllable Text-to-Image Generation
    Li, Bowen
    Qi, Xiaojuan
    Lukasiewicz, Thomas
    Torr, Philip H. S.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [3] Surgical text-to-image generation
    Nwoye, Chinedu Innocent
    Bose, Rupak
    Elgohary, Kareem
    Arboit, Lorenzo
    Carlino, Giorgio
    Lavanchy, Joel L.
    Mascagni, Pietro
    Padoy, Nicolas
    PATTERN RECOGNITION LETTERS, 2025, 190 : 73 - 80
  • [4] Expressive Text-to-Image Generation with Rich Text
    Ge, Songwei
    Park, Taesung
    Zhu, Jun-Yan
    Huang, Jia-Bin
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7511 - 7522
  • [5] Mobile App for Text-to-Image Synthesis
    Kang, Ryan
    Sunil, Athira
    Chen, Min
    MOBILE COMPUTING, APPLICATIONS, AND SERVICES, MOBICASE 2019, 2019, 290 : 32 - 43
  • [6] SEMANTICALLY INVARIANT TEXT-TO-IMAGE GENERATION
    Sah, Shagan
    Peri, Dheeraj
    Shringi, Ameya
    Zhang, Chi
    Dominguez, Miguel
    Savakis, Andreas
    Ptucha, Ray
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 3783 - 3787
  • [7] SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds
    Li, Yanyu
    Wang, Huan
    Jin, Qing
    Hu, Ju
    Chemerys, Pavlo
    Fu, Yun
    Wang, Yanzhi
    Tulyakov, Sergey
    Ren, Jian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [8] Semantics Disentangling for Text-to-Image Generation
    Yin, Guojun
    Liu, Bin
    Sheng, Lu
    Yu, Nenghai
    Wang, Xiaogang
    Shao, Jing
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 2322 - 2331
  • [9] Text-to-Image Generation for Abstract Concepts
    Liao, Jiayi
    Chen, Xu
    Fu, Qiang
    Du, Lun
    He, Xiangnan
    Wang, Xiang
    Han, Shi
    Zhang, Dongmei
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3360 - 3368
  • [10] Shifted Diffusion for Text-to-image Generation
    Zhou, Yufan
    Liu, Bingchen
    Zhu, Yizhe
    Yang, Xiao
    Chen, Changyou
    Xu, Jinhui
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10157 - 10166