Open-Source Text-to-Image Models: Evaluation using Metrics and Human Perception

被引:0
|
作者
Yamac, Aylin [1 ]
Genc, Dilan [1 ]
Zaman, Esra [1 ]
Gerschner, Felix [1 ]
Klaiber, Marco [1 ]
Theissler, Andreas [1 ]
机构
[1] Aalen Univ Appl Sci, Aalen, Germany
关键词
text-to-image; open-source; weaknesses;
D O I
10.1109/COMPSAC61105.2024.00261
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-image models, which aim to convert text input into images, have gained popularity partly due to their flexibility and user-friendliness. However, there are still weaknesses in the generation of images intended to display emotions, visual text, multiple objects, relative positioning, and attribute binding. This study analyzes the weaknesses of three open-source models: Stable Diffusion v2-1, Openjourney, and Dreamlike Photoreal 2.0. The models are compared based on scores for quality, alignment, and aesthetics. The evaluation is based on (a) the metrics ClipScore, Frechet Inception Distance (FID), and Large-scale Artificial Intelligence Open Network (LAION) and (b) human perception obtained in user surveys. The evaluation revealed that all models show predominantly unsatisfactory performance, and the identified weaknesses were confirmed.
引用
收藏
页码:1659 / 1664
页数:6
相关论文
共 50 条
  • [1] Ethical-Lens: Curbing malicious usages of open-source text-to-image models
    Cai, Yuzhu
    Yin, Sheng
    Wei, Yuxi
    Xu, Chenxin
    Mao, Weibo
    Juefei-Xu, Felix
    Chen, Siheng
    Wang, Yanfeng
    Patterns, 2025, 6 (03):
  • [2] Holistic Evaluation of Text-to-Image Models
    Lee, Tony
    Yasunaga, Michihiro
    Meng, Chenlin
    Mai, Yifan
    Park, Joon Sung
    Gupta, Agrim
    Zhang, Yunzhi
    Narayanan, Deepak
    Teufel, Hannah Benita
    Bellagente, Marco
    Kang, Minguk
    Park, Taesung
    Leskovec, Jure
    Zhu, Jun-Yan
    Li Fei-Fei
    Wu, Jiajun
    Ermon, Stefano
    Liang, Percy
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] TISE: Bag of Metrics for Text-to-Image Synthesis Evaluation
    Dinh, Tan M.
    Rang Nguyen
    Binh-Son Hua
    COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 594 - 609
  • [4] Unleashing Text-to-Image Diffusion Models for Visual Perception
    Zhao, Wenliang
    Rao, Yongming
    Liu, Zuyan
    Liu, Benlin
    Zhou, Jie
    Lu, Jiwen
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5706 - 5716
  • [5] Image Diversity Evaluation Metrics Correlated with Human Subjectivity and Prediction of Image Diversity in Text-to-image Synthesis
    Okamoto, Natsuo
    Shinagawa, Seitaro
    Nakamura, Satoshi
    Transactions of the Japanese Society for Artificial Intelligence, 2024, 39 (06)
  • [6] Towards Geographic Inclusion in the Evaluation of Text-to-Image Models
    Hall, Melissa
    Bell, Samuel J.
    Ross, Candace
    Williams, Adina
    Drozdzal, Michal
    Soriano, Adriana Romero
    PROCEEDINGS OF THE 2024 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, ACM FACCT 2024, 2024, : 585 - 601
  • [7] Visual question answering based evaluation metrics for text-to-image generation
    Miyamoto, Mizuki
    Morita, Ryugo
    Zhou, Jinjia
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [8] Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models
    Xu, Jiarui
    Liu, Sifei
    Vahdat, Arash
    Byeon, Wonmin
    Wang, Xiaolong
    De Meo, Shalini
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2955 - 2966
  • [9] Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion
    Jung, Sanghyun
    Jung, Seohyeon
    Kim, Balhae
    Choi, Moonseok
    Shin, Jinwoo
    Lee, Juho
    COMPUTER VISION - ECCV 2024, PT LXVII, 2025, 15125 : 128 - 145
  • [10] Debiasing Text-to-Image Diffusion Models
    He, Ruifei
    Xue, Chuhui
    Tan, Haoru
    Zhang, Wenqing
    Yu, Yingchen
    Bai, Song
    Qi, Xiaojuan
    PROCEEDINGS OF THE 1ST ACM MULTIMEDIA WORKSHOP ON MULTI-MODAL MISINFORMATION GOVERNANCE IN THE ERA OF FOUNDATION MODELS, MIS 2024, 2024, : 29 - 36