ViSAGe: A Global-Scale Analysis of Visual Stereotypes in Text-to-Image Generation

被引:0
|
作者
Jha, Akshita [1 ,2 ]
Prabhakaran, Vinodkumar [2 ]
Denton, Remi [2 ]
Laszlo, Sarah [2 ]
Dave, Shachi [2 ]
Qadri, Rida [2 ]
Reddy, Chandan K. [1 ]
Dev, Sunipa [2 ]
机构
[1] Virginia Tech, Blacksburg, VA 24061 USA
[2] Google Res, Mountain View, CA USA
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Recent studies have shown that Text-toImage (T2I) model generations can reflect social stereotypes present in the real world. However, existing approaches for evaluating stereotypes have a noticeable lack of coverage of global identity groups and their associated stereotypes. To address this gap, we introduce the ViSAGe (Visual Stereotypes Around the Globe) dataset to enable evaluation of known nationality-based stereotypes in T2I models, across 135 nationalities. We enrich an existing textual stereotype resource by distinguishing between stereotypical associations that are more likely to have visual depictions, such as 'sombrero', from those that are less visually concrete, such as 'attractive'. We demonstrate ViSAGe's utility through a multi-faceted evaluation of T2I generations. First, we show that stereotypical attributes in ViSAGe are thrice as likely to be present in generated images of corresponding identities as compared to other attributes, and that the offensiveness of these depictions is especially higher for identities from Africa, South America, and South East Asia. Second, we assess the stereotypical pull of visual depictions of identity groups, which reveals how the 'default' representations of all identity groups in ViSAGe have a pull towards stereotypical depictions, and that this pull is even more prominent for identity groups from the Global South. CONTENT WARNING: Some examples contain offensive stereotypes.
引用
收藏
页码:12333 / 12347
页数:15
相关论文
共 50 条
  • [41] HARIVO: Harnessing Text-to-Image Models for Video Generation
    Kwon, Mingi
    Oh, Seoung Wug
    Zhou, Yang
    Liu, Difan
    Lee, Joon-Young
    Cai, Haoran
    Liu, Baqiao
    Liu, Feng
    Uh, Youngjung
    COMPUTER VISION - ECCV 2024, PT LIII, 2025, 15111 : 19 - 36
  • [42] ITI- GEN: Inclusive Text-to-Image Generation
    Zhang, Cheng
    Chen, Xuanbai
    Chai, Siqi
    Wu, Chen Henry
    Lagun, Dmitry
    Beeler, Thabo
    De la Torre, Fernando
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3946 - 3957
  • [43] Translation-Enhanced Multilingual Text-to-Image Generation
    Li, Yaoyiran
    Chang, Ching-Yun
    Rawls, Stephen
    Vulic, Ivan
    Korhonen, Anna
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 9174 - 9193
  • [44] Training-Free Consistent Text-to-Image Generation
    Tewel, Yoad
    Kaduri, Omri
    Gal, Rinon
    Kasten, Yoni
    Wolf, Lior
    Chechik, Gal
    Atzmon, Yuval
    ACM TRANSACTIONS ON GRAPHICS, 2024, 43 (04):
  • [45] Text-to-image generation combined with mutual information maximization
    Mo J.
    Xu K.
    Lin L.
    Ouyang N.
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2019, 46 (05): : 180 - 188
  • [46] EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models
    Yang, Jingyuan
    Feng, Jiawei
    Huang, Hui
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6358 - 6368
  • [47] Locally controllable network based on visual-linguistic relation alignment for text-to-image generation
    Li, Zaike
    Liu, Li
    Zhang, Huaxiang
    Liu, Dongmei
    Song, Yu
    Li, Boqun
    MULTIMEDIA SYSTEMS, 2024, 30 (01)
  • [48] Background Layout Generation and Object Knowledge Transfer for Text-to-Image Generation
    Chen, Zhuowei
    Mao, Zhendong
    Fang, Shancheng
    Hu, Bo
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4327 - 4335
  • [49] GreenStableYolo: Optimizing Inference Time and Image Quality of Text-to-Image Generation
    Gong, Jingzhi
    Li, Sisi
    D'Aloisio, Giordano
    Ding, Zishuo
    Ye, Yulong
    Langdon, William B.
    Sarro, Federica
    SEARCH-BASED SOFTWARE ENGINEERING, SSBSE 2024, 2024, 14767 : 70 - 76
  • [50] UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
    Xu, Yanwu
    Zhao, Yang
    Xiao, Zhisheng
    Hou, Tingbo
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 8196 - 8206