CogView: Mastering Text-to-Image Generation via Transformers

被引:0
|
作者
Ding, Ming [1 ]
Yang, Zhuoyi [1 ]
Hong, Wenyi [1 ]
Zheng, Wendi [1 ]
Zhou, Chang [2 ]
Yin, Da [1 ]
Lin, Junyang [2 ]
Zou, Xu [1 ]
Shao, Zhou [3 ]
Yang, Hongxia [2 ]
Tang, Jie [1 ,3 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] DAMO Acad, Alibaba Grp, Hangzhou, Peoples R China
[3] BAAI, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-Image generation in the general domain has long been an open problem, which requires both a powerful generative model and cross-modal understanding. We propose CogView, a 4-billion-parameter Transformer with VQ-VAE tokenizer to advance this problem. We also demonstrate the finetuning strategies for various downstream tasks, e.g. style learning, super-resolution, text-image ranking and fashion design, and methods to stabilize pretraining, e.g. eliminating NaN losses. CogView achieves the state-of-the-art FID on the blurred MS COCO dataset, outperforming previous GAN-based models and a recent similar work DALL-E. [GRAPHICS] .
引用
收藏
页数:14
相关论文
共 50 条
  • [11] Shifted Diffusion for Text-to-image Generation
    Zhou, Yufan
    Liu, Bingchen
    Zhu, Yizhe
    Yang, Xiao
    Chen, Changyou
    Xu, Jinhui
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10157 - 10166
  • [12] Perceptions and Realities of Text-to-Image Generation
    Oppenlaender, Jonas
    Silvennoinen, Johanna
    Paananen, Ville
    Visuri, Aku
    PROCEEDINGS OF THE 26TH INTERNATIONAL ACADEMIC MINDTREK, MINDTREK 2023, 2023, : 279 - 288
  • [13] Optimizing Prompts for Text-to-Image Generation
    Hao, Yaru
    Chi, Zewen
    Dong, Li
    Wei, Furu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [14] Diversified text-to-image generation via deep mutual information estimation
    Li, Ailin
    Zhao, Lei
    Zuo, Zhiwen
    Wang, Zhizhong
    Chen, Haibo
    Lu, Dongming
    Xing, Wei
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 211
  • [15] Subject-driven Text-to-Image Generation via Apprenticeship Learning
    Chen, Wenhu
    Hu, Hexiang
    Li, Yandong
    Ruiz, Nataniel
    Jia, Xuhui
    Chang, Ming-Wei
    Cohen, William W.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [16] RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
    Xue, Zeyue
    Song, Guanglu
    Guo, Qiushan
    Liu, Boxiao
    Zong, Zhuofan
    Liu, Yu
    Luo, Ping
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [17] Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning
    Wei, Fanyue
    Zeng, Wei
    Li, Zhenyang
    Yin, Dawei
    Duan, Lixin
    Li, Wen
    COMPUTER VISION - ECCV 2024, PT XXVII, 2025, 15085 : 394 - 410
  • [18] Prompt Refinement with Image Pivot for Text-to-Image Generation
    Zhan, Jingtao
    Ai, Qingyao
    Liu, Yiqun
    Pan, Yingwei
    Yao, Ting
    Mao, Jiaxin
    Ma, Shaoping
    Mei, Tao
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 941 - 954
  • [19] Development and Classification of Image Dataset for Text-to-Image Generation
    Kumar M.
    Mittal M.
    Singh S.
    Journal of The Institution of Engineers (India): Series B, 2024, 105 (04) : 787 - 796
  • [20] Visual Programming for Text-to-Image Generation and Evaluation
    Cho, Jaemin
    Zala, Abhay
    Bansal, Mohit
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,