Gender Biases in Automatic Evaluation Metrics for Image Captioning

被引:0
|
作者
Qiu, Haoyi [1 ]
Dou, Zi-Yi [1 ]
Wang, Tianlu [2 ]
Celikyilmaz, Asli [2 ]
Peng, Nanyun [1 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90024 USA
[2] Meta AI Res, Menlo Pk, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Model-based evaluation metrics (e.g., CLIPScore and GPTScore) have demonstrated decent correlations with human judgments in various language generation tasks. However, their impact on fairness remains largely unexplored. It is widely recognized that pretrained models can inadvertently encode societal biases, thus employing these models for evaluation purposes may inadvertently perpetuate and amplify biases. For example, an evaluation metric may favor the caption "a woman is calculating an account book" over "a man is calculating an account book," even if the image only shows male accountants. In this paper, we conduct a systematic study of gender biases in modelbased automatic evaluation metrics for image captioning tasks. We start by curating a dataset comprising profession, activity, and object concepts associated with stereotypical gender associations. Then, we demonstrate the negative consequences of using these biased metrics, including the inability to differentiate between biased and unbiased generations, as well as the propagation of biases to generation models through reinforcement learning. Finally, we present a simple and effective way to mitigate the metric bias without hurting the correlations with human judgments. Our dataset and framework lay the foundation for understanding the potential harm of model-based evaluation metrics, and facilitate future works to develop more inclusive evaluation metrics.(1)
引用
收藏
页码:8358 / 8375
页数:18
相关论文
共 50 条
  • [41] Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning
    Sharma, Piyush
    Ding, Nan
    Goodman, Sebastian
    Soricut, Radu
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 2556 - 2565
  • [42] Image Captioning with Clause-Focused Metrics in a Multi-Modal Setting for Marketing
    Harzig, Philipp
    Zecha, Dan
    Lienhart, Rainer
    Kaiser, Carolin
    Schallner, Rene
    2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, : 419 - 424
  • [43] Evolution of visual data captioning Methods, Datasets, and evaluation Metrics: A comprehensive survey
    Sharma, Dhruv
    Dhiman, Chhavi
    Kumar, Dinesh
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 221
  • [44] Automatic Defect Description of Railway Track Line Image Based on Dense Captioning
    Wei, Dehua
    Wei, Xiukun
    Jia, Limin
    SENSORS, 2022, 22 (17)
  • [45] Could We Create A Training Set For Image Captioning Using Automatic Translation?
    Samet, Nermin
    Hicsonmez, Samet
    Duygulu, Pinar
    Akbas, Emre
    2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
  • [46] Automatic image captioning combining natural language processing and deep neural networks
    Rinaldi, Antonio M.
    Russo, Cristiano
    Tommasino, Cristian
    RESULTS IN ENGINEERING, 2023, 18
  • [47] Automatic Image Captioning Based on ResNet50 and LSTM with Soft Attention
    Chu, Yan
    Yue, Xiao
    Yu, Lei
    Sergei, Mikhailov
    Wang, Zhengkui
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2020, 2020
  • [48] Evaluation Metrics for Automatic Temporal Annotation of Texts
    Tannier, Xavier
    Muller, Philippe
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 150 - 155
  • [49] The significance of recall in automatic metrics for MT evaluation
    Lavie, A
    Sagae, K
    Jayaraman, S
    MACHINE TRANSLATION: FROM REAL USERS TO RESEARCH, PROCEEDINGS, 2004, 3265 : 134 - 143
  • [50] Evaluation Metrics for Conditional Image Generation
    Yaniv Benny
    Tomer Galanti
    Sagie Benaim
    Lior Wolf
    International Journal of Computer Vision, 2021, 129 : 1712 - 1731