Open-Source Text-to-Image Models: Evaluation using Metrics and Human Perception

被引:0
|
作者
Yamac, Aylin [1 ]
Genc, Dilan [1 ]
Zaman, Esra [1 ]
Gerschner, Felix [1 ]
Klaiber, Marco [1 ]
Theissler, Andreas [1 ]
机构
[1] Aalen Univ Appl Sci, Aalen, Germany
关键词
text-to-image; open-source; weaknesses;
D O I
10.1109/COMPSAC61105.2024.00261
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-image models, which aim to convert text input into images, have gained popularity partly due to their flexibility and user-friendliness. However, there are still weaknesses in the generation of images intended to display emotions, visual text, multiple objects, relative positioning, and attribute binding. This study analyzes the weaknesses of three open-source models: Stable Diffusion v2-1, Openjourney, and Dreamlike Photoreal 2.0. The models are compared based on scores for quality, alignment, and aesthetics. The evaluation is based on (a) the metrics ClipScore, Frechet Inception Distance (FID), and Large-scale Artificial Intelligence Open Network (LAION) and (b) human perception obtained in user surveys. The evaluation revealed that all models show predominantly unsatisfactory performance, and the identified weaknesses were confirmed.
引用
收藏
页码:1659 / 1664
页数:6
相关论文
共 50 条
  • [41] Out-of-Distribution with Text-to-Image Diffusion Models
    Tong, Jinglin
    Dai, Longquan
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT XI, 2024, 14435 : 276 - 288
  • [42] HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
    Ruiz, Nataniel
    Li, Yuanzhen
    Jampani, Varun
    Wei, Wei
    Hou, Tingbo
    Pritch, Yael
    Wadhwa, Neal
    Rubinstein, Michael
    Aberman, Kfir
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6527 - 6536
  • [43] Editing Implicit Assumptions in Text-to-Image Diffusion Models
    Orgad, Hadas
    Kawar, Bahjat
    Belinkov, Yonatan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7030 - 7038
  • [44] Comparative Analysis of Open-Source Language Models in Summarizing Medical Text Data
    Chen, Yuhao
    Wang, Zhimu
    Zulkernine, Farhana
    2024 IEEE INTERNATIONAL CONFERENCE ON DIGITAL HEALTH, ICDH 2024, 2024, : 126 - 128
  • [45] Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models
    Wu, Qiucheng
    Liu, Yujian
    Zhao, Handong
    Kale, Ajinkya
    Bui, Trung
    Yu, Tong
    Lin, Zhe
    Zhang, Yang
    Chang, Shiyu
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1900 - 1910
  • [46] Adversarial Robustification via Text-to-Image Diffusion Models
    Choi, Daewon
    Jeong, Jongheon
    Jang, Huiwon
    Shin, Jinwoo
    COMPUTER VISION - ECCV 2024, PT LXXXI, 2025, 15139 : 158 - 177
  • [47] HARIVO: Harnessing Text-to-Image Models for Video Generation
    Kwon, Mingi
    Oh, Seoung Wug
    Zhou, Yang
    Liu, Difan
    Lee, Joon-Young
    Cai, Haoran
    Liu, Baqiao
    Liu, Feng
    Uh, Youngjung
    COMPUTER VISION - ECCV 2024, PT LIII, 2025, 15111 : 19 - 36
  • [48] Sketch-Guided Text-to-Image Diffusion Models
    Voynov, Andrey
    Aberman, Kfir
    Cohen-Or, Daniel
    PROCEEDINGS OF SIGGRAPH 2023 CONFERENCE PAPERS, SIGGRAPH 2023, 2023,
  • [49] Using artificial intelligence in craft education: crafting with text-to-image generative models
    Vartiainen, Henriikka
    Tedre, Matti
    DIGITAL CREATIVITY, 2023, 34 (01) : 1 - 21
  • [50] Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
    Zhu, Xiaoyu
    Zhou, Hao
    Xing, Pengfei
    Zhao, Long
    Xu, Hao
    Liang, Junwei
    Hauptmann, Alexander
    Liu, Ting
    Gallagher, Andrew
    COMPUTER VISION - ECCV 2024, PT XXIX, 2025, 15087 : 357 - 375