Open-Source Text-to-Image Models: Evaluation using Metrics and Human Perception

被引:0
|
作者
Yamac, Aylin [1 ]
Genc, Dilan [1 ]
Zaman, Esra [1 ]
Gerschner, Felix [1 ]
Klaiber, Marco [1 ]
Theissler, Andreas [1 ]
机构
[1] Aalen Univ Appl Sci, Aalen, Germany
关键词
text-to-image; open-source; weaknesses;
D O I
10.1109/COMPSAC61105.2024.00261
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-image models, which aim to convert text input into images, have gained popularity partly due to their flexibility and user-friendliness. However, there are still weaknesses in the generation of images intended to display emotions, visual text, multiple objects, relative positioning, and attribute binding. This study analyzes the weaknesses of three open-source models: Stable Diffusion v2-1, Openjourney, and Dreamlike Photoreal 2.0. The models are compared based on scores for quality, alignment, and aesthetics. The evaluation is based on (a) the metrics ClipScore, Frechet Inception Distance (FID), and Large-scale Artificial Intelligence Open Network (LAION) and (b) human perception obtained in user surveys. The evaluation revealed that all models show predominantly unsatisfactory performance, and the identified weaknesses were confirmed.
引用
收藏
页码:1659 / 1664
页数:6
相关论文
共 50 条
  • [21] SPS: A Subjective Perception Score for Text-to-Image Synthesis
    Zhang, Xuewen
    Yu, Wenxin
    Jiang, Ning
    Zhang, Yunye
    Zhang, Zhiqiang
    2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
  • [22] Evaluating Data Attribution for Text-to-Image Models
    Wang, Sheng-Yu
    Efros, Alexei A.
    Zhu, Jun-Yan
    Zhang, Richard
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7158 - 7169
  • [23] Multilingual Conceptual Coverage in Text-to-Image Models
    Saxon, Michael
    Wang, William Yang
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4831 - 4848
  • [24] Text-to-Image Synthesis With Generative Models: Methods, Datasets, Performance Metrics, Challenges, and Future Direction
    Alhabeeb, Sarah K.
    Al-Shargabi, Amal A.
    IEEE ACCESS, 2024, 12 : 24412 - 24427
  • [25] Ablating Concepts in Text-to-Image Diffusion Models
    Kumari, Nupur
    Zhang, Bingliang
    Wang, Sheng-Yu
    Shechtman, Eli
    Zhang, Richard
    Zhu, Jun-Yan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22634 - 22645
  • [26] Resolving Ambiguities in Text-to-Image Generative Models
    Mehrabi, Ninareh
    Goyal, Palash
    Verma, Apurv
    Dhamala, Jwala
    Kumar, Varun
    Hu, Qian
    Chang, Kai-Wei
    Zemel, Richard
    Galstyan, Aram
    Gupta, Rahul
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14367 - 14388
  • [27] Typology of Risks of Generative Text-to-Image Models
    Bird, Charlotte
    Ungless, Eddie L.
    Kasirzadeh, Atoosa
    PROCEEDINGS OF THE 2023 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2023, 2023, : 396 - 410
  • [28] SneakyPrompt: Jailbreaking Text-to-image Generative Models
    Yang, Yuchen
    Hui, Bo
    Yuan, Haolin
    Gong, Neil
    Cao, Yinzhi
    45TH IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP 2024, 2024, : 897 - 912
  • [29] From text to mask: Localizing entities using the attention of text-to-image diffusion models
    Xiao, Changming
    Yang, Qi
    Zhou, Feng
    Zhang, Changshui
    NEUROCOMPUTING, 2024, 610
  • [30] Using chemical structure in open-source chemical text mining
    PT Corbett
    P Murray-Rust
    Chemistry Central Journal, 2 (Suppl 1)