Text-to-Image Synthesis With Generative Models: Methods, Datasets, Performance Metrics, Challenges, and Future Direction

被引：3

作者：

Alhabeeb, Sarah K. ^{[1
]}

Al-Shargabi, Amal A. ^{[1
]}

机构：

[1] Qassim Univ, Coll Comp, Dept Informat Technol, Buraydah 51452, Saudi Arabia

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Deep learning; diffusion model; generative models; generative adversarial network; text-to-image synthesis; GAN;

D O I：

10.1109/ACCESS.2024.3365043

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Text-to-image synthesis, the process of turning words into images, opens up a world of creative possibilities, and meets the growing need for engaging visual experiences in a world that is becoming more image-based. As machine learning capabilities expanded, the area progressed from simple tools and systems to robust deep learning models that can automatically generate realistic images from textual inputs. Modern, large-scale text-to-image generation models have made significant progress in this direction, producing diversified and high-quality images from text description prompts. Although several methods exist, Generative Adversarial Networks (GANs) have long held a position of prominence. However, diffusion models have recently emerged, with results much beyond those achieved by GANs. This study offers a concise overview of text-to-image generative models by examining the existing body of literature and providing a deeper understanding of this topic. This will be accomplished by providing a concise summary of the development of text-to-image synthesis, previous tools and systems employed in this field, key types of generative models, as well as an exploration of the relevant research conducted on GANs and diffusion models. Additionally, the study provides an overview of common datasets utilized for training the text-to-image model, compares the evaluation metrics used for evaluating the models, and addresses the challenges encountered in the field. Finally, concluding remarks are provided to summarize the findings and implications of the study and open issues for further research.

引用

页码：24412 / 24427

页数：16

共 50 条

[1] Resolving Ambiguities in Text-to-Image Generative Models
Mehrabi, Ninareh
Goyal, Palash
Verma, Apurv
Dhamala, Jwala
Kumar, Varun
Hu, Qian
Chang, Kai-Wei
Zemel, Richard
Galstyan, Aram
Gupta, Rahul
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14367 - 14388
[2] Typology of Risks of Generative Text-to-Image Models
Bird, Charlotte
Ungless, Eddie L.
Kasirzadeh, Atoosa
PROCEEDINGS OF THE 2023 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2023, 2023, : 396 - 410
[3] SneakyPrompt: Jailbreaking Text-to-image Generative Models
Yang, Yuchen
Hui, Bo
Yuan, Haolin
Gong, Neil
Cao, Yinzhi
45TH IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP 2024, 2024, : 897 - 912
[4] Recent Advances in Text-to-Image Synthesis: Approaches, Datasets and Future Research Prospects
Tan, Yong Xuan
Lee, Chin Poo
Neo, Mai
Lim, Kian Ming
Lim, Jit Yan
Alqahtani, Ali
IEEE ACCESS, 2023, 11 : 88099 - 88115
[5] Advancements in adversarial generative text-to-image models: a review
Zaghloul, Rawan
Rawashdeh, Enas
Bani-Ata, Tomader
IMAGING SCIENCE JOURNAL, 2024,
[6] Semantic Object Accuracy for Generative Text-to-Image Synthesis
Hinz, Tobias
Heinrich, Stefan
Wermter, Stefan
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (03) : 1552 - 1565
[7] GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis
Tao, Ming
Bao, Bing-Kun
Tang, Hao
Xu, Changsheng
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14214 - 14223
[8] TISE: Bag of Metrics for Text-to-Image Synthesis Evaluation
Dinh, Tan M.
Rang Nguyen
Binh-Son Hua
COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 594 - 609
[9] Example-Based Conditioning for Text-to-Image Generative Models
Takada, Atsushi
Kawabe, Wataru
Sugano, Yusuke
IEEE ACCESS, 2024, 12 : 162191 - 162203
[10] BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models
Vice, Jordan
Akhtar, Naveed
Hartley, Richard
Mian, Ajmal
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 4865 - 4880

← 1 2 3 4 5 →