MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices

被引:0
|
作者
Zhao, Yang [1 ]
Xu, Yanwu [1 ]
Xiao, Zhisheng [1 ]
Jia, Haolin [1 ]
Hou, Tingbo [2 ]
机构
[1] Google, New York, NY USA
[2] Meta GenAI, New York, NY USA
来源
关键词
D O I
10.1007/978-3-031-73033-7_13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The deployment of large-scale text-to-image diffusion models on mobile devices is impeded by their substantial model size and high latency. In this paper, we present MobileDiffusion, an ultra-efficient text-to-image diffusion model obtained through extensive optimizations in both architecture and sampling techniques. We conduct a comprehensive examination of model architecture design to minimize model size and FLOPs, while preserving image generation quality. Additionally, we revisit the advanced sampling technique by diffusion-GAN, and make one-step sampling compatible to downstream applications trained on the base model. Empirical studies, conducted both quantitatively and qualitatively, demonstrate the effectiveness of our proposed technologies. With them, MobileDiffusion achieves instant text-to-image generation on mobile devices, establishing a new state of the art.
引用
收藏
页码:225 / 242
页数:18
相关论文
共 50 条
  • [21] Text-to-Image Generation Method Based on Image-Text Semantic Consistency
    Xue Z.
    Xu Z.
    Lang C.
    Feng S.
    Wang T.
    Li Y.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (09): : 2180 - 2190
  • [22] Generative adversarial text-to-image generation with style image constraint
    Zekang Wang
    Li Liu
    Huaxiang Zhang
    Dongmei Liu
    Yu Song
    Multimedia Systems, 2023, 29 : 3291 - 3303
  • [23] Generative adversarial text-to-image generation with style image constraint
    Wang, Zekang
    Liu, Li
    Zhang, Huaxiang
    Liu, Dongmei
    Song, Yu
    MULTIMEDIA SYSTEMS, 2023, 29 (06) : 3291 - 3303
  • [24] Improving text-to-image generation with object layout guidance
    Jezia Zakraoui
    Moutaz Saleh
    Somaya Al-Maadeed
    Jihad Mohammed Jaam
    Multimedia Tools and Applications, 2021, 80 : 27423 - 27443
  • [25] Variational Distribution Learning for Unsupervised Text-to-Image Generation
    Kang, Minsoo
    Lee, Doyup
    Kim, Jiseob
    Kim, Saehoon
    Han, Bohyung
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23380 - 23389
  • [26] HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances
    Narasimhaswamy, Supreeth
    Bhattacharya, Uttaran
    Chen, Xiang
    Dasgupta, Ishita
    Mitra, Saayan
    Hoai, Minh
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 2468 - 2479
  • [27] Attribute-Centric Compositional Text-to-Image Generation
    Cong, Yuren
    Min, Martin Renqiang
    Li, Li Erran
    Rosenhahn, Bodo
    Yang, Michael Ying
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025,
  • [28] Using text-to-image generation for architectural design ideation
    Paananen, Ville
    Oppenlaender, Jonas
    Visuri, Aku
    INTERNATIONAL JOURNAL OF ARCHITECTURAL COMPUTING, 2024, 22 (03) : 458 - 474
  • [29] No-reference Quality Assessment of Text-to-Image Generation
    Huang, Haitao
    Jia, Rongli
    Zhang, Yuhong
    Xie, Rong
    Song, Li
    Li, Lin
    Feng, Yanan
    19TH IEEE INTERNATIONAL SYMPOSIUM ON BROADBAND MULTIMEDIA SYSTEMS AND BROADCASTING, BMSB 2024, 2024, : 357 - 362
  • [30] CogView: Mastering Text-to-Image Generation via Transformers
    Ding, Ming
    Yang, Zhuoyi
    Hong, Wenyi
    Zheng, Wendi
    Zhou, Chang
    Yin, Da
    Lin, Junyang
    Zou, Xu
    Shao, Zhou
    Yang, Hongxia
    Tang, Jie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34