Open-Source Text-to-Image Models: Evaluation using Metrics and Human Perception

被引:0
|
作者
Yamac, Aylin [1 ]
Genc, Dilan [1 ]
Zaman, Esra [1 ]
Gerschner, Felix [1 ]
Klaiber, Marco [1 ]
Theissler, Andreas [1 ]
机构
[1] Aalen Univ Appl Sci, Aalen, Germany
关键词
text-to-image; open-source; weaknesses;
D O I
10.1109/COMPSAC61105.2024.00261
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-image models, which aim to convert text input into images, have gained popularity partly due to their flexibility and user-friendliness. However, there are still weaknesses in the generation of images intended to display emotions, visual text, multiple objects, relative positioning, and attribute binding. This study analyzes the weaknesses of three open-source models: Stable Diffusion v2-1, Openjourney, and Dreamlike Photoreal 2.0. The models are compared based on scores for quality, alignment, and aesthetics. The evaluation is based on (a) the metrics ClipScore, Frechet Inception Distance (FID), and Large-scale Artificial Intelligence Open Network (LAION) and (b) human perception obtained in user surveys. The evaluation revealed that all models show predominantly unsatisfactory performance, and the identified weaknesses were confirmed.
引用
收藏
页码:1659 / 1664
页数:6
相关论文
共 50 条
  • [31] LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation
    Lu, Yujie
    Yang, Xianjun
    Li, Xiujun
    Wang, Xin Eric
    Wang, William Yang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [32] SINE: SINgle Image Editing with Text-to-Image Diffusion Models
    Zhang, Zhixing
    Han, Ligong
    Ghosh, Arnab
    Metaxas, Dimitris
    Ren, Jian
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6027 - 6037
  • [33] FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models
    Zhao, Lin
    Zhao, Tianchen
    Lin, Zinan
    Ning, Xuefei
    Dai, Guohao
    Yang, Huazhong
    Wan, Yu
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 16122 - 16131
  • [34] Visual Programming for Text-to-Image Generation and Evaluation
    Cho, Jaemin
    Zala, Abhay
    Bansal, Mohit
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [35] InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models
    Hoe, Jiun Tian
    Jiang, Xudong
    Chan, Chee Seng
    Tan, Yap-Peng
    Hu, Weipeng
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6180 - 6189
  • [36] ARletta. Open-Source Handwritten Text Recognition Models for Historic Dutch
    Lefranc, Lith
    Van Damme, Ilja
    Clerice, Thibault
    Kestemont, Mike
    JOURNAL OF OPEN HUMANITIES DATA, 2024, 10
  • [37] Advancements in adversarial generative text-to-image models: a review
    Zaghloul, Rawan
    Rawashdeh, Enas
    Bani-Ata, Tomader
    IMAGING SCIENCE JOURNAL, 2024,
  • [38] Discriminative Class Tokens for Text-to-Image Diffusion Models
    Schwartz, Idan
    Snaebjarnarson, Vesteinn
    Chefer, Hila
    Belongie, Serge
    Wolf, Lior
    Benaim, Sagie
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22668 - 22678
  • [39] Adding Conditional Control to Text-to-Image Diffusion Models
    Zhang, Lvmin
    Rao, Anyi
    Agrawala, Maneesh
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3813 - 3824
  • [40] Improving Automatic Text Recognition with Language Models in the PyLaia Open-Source Library
    Tarride, Solene
    Schneider, Yoann
    Generali-Lince, Marie
    Boillet, Melodie
    Abadie, Bastien
    Kermorvant, Christopher
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT V, 2024, 14808 : 387 - 404