Text-to-Image Synthesis With Generative Models: Methods, Datasets, Performance Metrics, Challenges, and Future Direction

被引:3
|
作者
Alhabeeb, Sarah K. [1 ]
Al-Shargabi, Amal A. [1 ]
机构
[1] Qassim Univ, Coll Comp, Dept Informat Technol, Buraydah 51452, Saudi Arabia
关键词
Deep learning; diffusion model; generative models; generative adversarial network; text-to-image synthesis; GAN;
D O I
10.1109/ACCESS.2024.3365043
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text-to-image synthesis, the process of turning words into images, opens up a world of creative possibilities, and meets the growing need for engaging visual experiences in a world that is becoming more image-based. As machine learning capabilities expanded, the area progressed from simple tools and systems to robust deep learning models that can automatically generate realistic images from textual inputs. Modern, large-scale text-to-image generation models have made significant progress in this direction, producing diversified and high-quality images from text description prompts. Although several methods exist, Generative Adversarial Networks (GANs) have long held a position of prominence. However, diffusion models have recently emerged, with results much beyond those achieved by GANs. This study offers a concise overview of text-to-image generative models by examining the existing body of literature and providing a deeper understanding of this topic. This will be accomplished by providing a concise summary of the development of text-to-image synthesis, previous tools and systems employed in this field, key types of generative models, as well as an exploration of the relevant research conducted on GANs and diffusion models. Additionally, the study provides an overview of common datasets utilized for training the text-to-image model, compares the evaluation metrics used for evaluating the models, and addresses the challenges encountered in the field. Finally, concluding remarks are provided to summarize the findings and implications of the study and open issues for further research.
引用
收藏
页码:24412 / 24427
页数:16
相关论文
共 50 条
  • [1] Resolving Ambiguities in Text-to-Image Generative Models
    Mehrabi, Ninareh
    Goyal, Palash
    Verma, Apurv
    Dhamala, Jwala
    Kumar, Varun
    Hu, Qian
    Chang, Kai-Wei
    Zemel, Richard
    Galstyan, Aram
    Gupta, Rahul
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14367 - 14388
  • [2] Typology of Risks of Generative Text-to-Image Models
    Bird, Charlotte
    Ungless, Eddie L.
    Kasirzadeh, Atoosa
    PROCEEDINGS OF THE 2023 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2023, 2023, : 396 - 410
  • [3] SneakyPrompt: Jailbreaking Text-to-image Generative Models
    Yang, Yuchen
    Hui, Bo
    Yuan, Haolin
    Gong, Neil
    Cao, Yinzhi
    45TH IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP 2024, 2024, : 897 - 912
  • [4] Recent Advances in Text-to-Image Synthesis: Approaches, Datasets and Future Research Prospects
    Tan, Yong Xuan
    Lee, Chin Poo
    Neo, Mai
    Lim, Kian Ming
    Lim, Jit Yan
    Alqahtani, Ali
    IEEE ACCESS, 2023, 11 : 88099 - 88115
  • [5] Advancements in adversarial generative text-to-image models: a review
    Zaghloul, Rawan
    Rawashdeh, Enas
    Bani-Ata, Tomader
    IMAGING SCIENCE JOURNAL, 2024,
  • [6] Semantic Object Accuracy for Generative Text-to-Image Synthesis
    Hinz, Tobias
    Heinrich, Stefan
    Wermter, Stefan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (03) : 1552 - 1565
  • [7] GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis
    Tao, Ming
    Bao, Bing-Kun
    Tang, Hao
    Xu, Changsheng
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14214 - 14223
  • [8] TISE: Bag of Metrics for Text-to-Image Synthesis Evaluation
    Dinh, Tan M.
    Rang Nguyen
    Binh-Son Hua
    COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 594 - 609
  • [9] Example-Based Conditioning for Text-to-Image Generative Models
    Takada, Atsushi
    Kawabe, Wataru
    Sugano, Yusuke
    IEEE ACCESS, 2024, 12 : 162191 - 162203
  • [10] BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models
    Vice, Jordan
    Akhtar, Naveed
    Hartley, Richard
    Mian, Ajmal
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 4865 - 4880