Open-Source Text-to-Image Models: Evaluation using Metrics and Human Perception

被引：0

作者：

Yamac, Aylin ^{[1
]}

Genc, Dilan ^{[1
]}

Zaman, Esra ^{[1
]}

Gerschner, Felix ^{[1
]}

Klaiber, Marco ^{[1
]}

Theissler, Andreas ^{[1
]}

机构：

[1] Aalen Univ Appl Sci, Aalen, Germany

来源：

2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024 | 2024年

关键词：

text-to-image; open-source; weaknesses;

D O I：

10.1109/COMPSAC61105.2024.00261

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text-to-image models, which aim to convert text input into images, have gained popularity partly due to their flexibility and user-friendliness. However, there are still weaknesses in the generation of images intended to display emotions, visual text, multiple objects, relative positioning, and attribute binding. This study analyzes the weaknesses of three open-source models: Stable Diffusion v2-1, Openjourney, and Dreamlike Photoreal 2.0. The models are compared based on scores for quality, alignment, and aesthetics. The evaluation is based on (a) the metrics ClipScore, Frechet Inception Distance (FID), and Large-scale Artificial Intelligence Open Network (LAION) and (b) human perception obtained in user surveys. The evaluation revealed that all models show predominantly unsatisfactory performance, and the identified weaknesses were confirmed.

引用

页码：1659 / 1664

页数：6

共 50 条

[21] SPS: A Subjective Perception Score for Text-to-Image Synthesis
Zhang, Xuewen
Yu, Wenxin
Jiang, Ning
Zhang, Yunye
Zhang, Zhiqiang
2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
[22] Evaluating Data Attribution for Text-to-Image Models
Wang, Sheng-Yu
Efros, Alexei A.
Zhu, Jun-Yan
Zhang, Richard
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7158 - 7169
[23] Multilingual Conceptual Coverage in Text-to-Image Models
Saxon, Michael
Wang, William Yang
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4831 - 4848
[24] Text-to-Image Synthesis With Generative Models: Methods, Datasets, Performance Metrics, Challenges, and Future Direction
Alhabeeb, Sarah K.
Al-Shargabi, Amal A.
IEEE ACCESS, 2024, 12 : 24412 - 24427
[25] Ablating Concepts in Text-to-Image Diffusion Models
Kumari, Nupur
Zhang, Bingliang
Wang, Sheng-Yu
Shechtman, Eli
Zhang, Richard
Zhu, Jun-Yan
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22634 - 22645
[26] Resolving Ambiguities in Text-to-Image Generative Models
Mehrabi, Ninareh
Goyal, Palash
Verma, Apurv
Dhamala, Jwala
Kumar, Varun
Hu, Qian
Chang, Kai-Wei
Zemel, Richard
Galstyan, Aram
Gupta, Rahul
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14367 - 14388
[27] Typology of Risks of Generative Text-to-Image Models
Bird, Charlotte
Ungless, Eddie L.
Kasirzadeh, Atoosa
PROCEEDINGS OF THE 2023 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2023, 2023, : 396 - 410
[28] SneakyPrompt: Jailbreaking Text-to-image Generative Models
Yang, Yuchen
Hui, Bo
Yuan, Haolin
Gong, Neil
Cao, Yinzhi
45TH IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP 2024, 2024, : 897 - 912
[29] From text to mask: Localizing entities using the attention of text-to-image diffusion models
Xiao, Changming
Yang, Qi
Zhou, Feng
Zhang, Changshui
NEUROCOMPUTING, 2024, 610
[30] Using chemical structure in open-source chemical text mining
PT Corbett
P Murray-Rust
Chemistry Central Journal, 2 (Suppl 1)

← 1 2 3 4 5 →