Gender Biases in Automatic Evaluation Metrics for Image Captioning

被引:0
|
作者
Qiu, Haoyi [1 ]
Dou, Zi-Yi [1 ]
Wang, Tianlu [2 ]
Celikyilmaz, Asli [2 ]
Peng, Nanyun [1 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90024 USA
[2] Meta AI Res, Menlo Pk, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Model-based evaluation metrics (e.g., CLIPScore and GPTScore) have demonstrated decent correlations with human judgments in various language generation tasks. However, their impact on fairness remains largely unexplored. It is widely recognized that pretrained models can inadvertently encode societal biases, thus employing these models for evaluation purposes may inadvertently perpetuate and amplify biases. For example, an evaluation metric may favor the caption "a woman is calculating an account book" over "a man is calculating an account book," even if the image only shows male accountants. In this paper, we conduct a systematic study of gender biases in modelbased automatic evaluation metrics for image captioning tasks. We start by curating a dataset comprising profession, activity, and object concepts associated with stereotypical gender associations. Then, we demonstrate the negative consequences of using these biased metrics, including the inability to differentiate between biased and unbiased generations, as well as the propagation of biases to generation models through reinforcement learning. Finally, we present a simple and effective way to mitigate the metric bias without hurting the correlations with human judgments. Our dataset and framework lay the foundation for understanding the potential harm of model-based evaluation metrics, and facilitate future works to develop more inclusive evaluation metrics.(1)
引用
收藏
页码:8358 / 8375
页数:18
相关论文
共 50 条
  • [31] Learning Contextual Metrics for Automatic Image Annotation
    Liu, Zuotao
    Zhou, Xiangdong
    Xiang, Yu
    Zheng, Yan-Tao
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING-PCM 2010, PT I, 2010, 6297 : 124 - +
  • [32] Vocabulary Learning Support System Based on Automatic Image Captioning Technology
    Hasnine, Mohammad Nehal
    Flanagan, Brendan
    Akcapinar, Gokhan
    Ogata, Hiroaki
    Mouri, Kousuke
    Uosaki, Noriko
    DISTRIBUTED, AMBIENT AND PERVASIVE INTERACTIONS, 2019, 11587 : 346 - 358
  • [33] An Evaluation of Image Quality Metrics
    J Photogr Sci, 1 (07):
  • [34] Automatic Bangla Image Captioning Based on Transformer Model in Deep Learning
    Hossain, Md Anwar
    Hasan, Mirza A. F. M. Rashidul
    Hossen, Ebrahim
    Asraful, Md
    Faruk, Md Omar
    Abadin, A. F. M. Zainul
    Ali, Md Suhag
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (11) : 1110 - 1117
  • [35] AN EVALUATION OF IMAGE QUALITY METRICS
    JACOBSON, RE
    JOURNAL OF PHOTOGRAPHIC SCIENCE, 1995, 43 (01): : 7 - 16
  • [36] A Survey on Automatic Image Captioning Approaches: Contemporary Trends and Future Perspectives
    Salgotra, Garima
    Abrol, Pawanesh
    Selwal, Arvind
    ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING, 2024, : 1459 - 1497
  • [37] Contrastive semantic similarity learning for image captioning evaluation
    Zeng, Chao
    Kwong, Sam
    Zhao, Tiesong
    Wang, Hanli
    INFORMATION SCIENCES, 2022, 609 : 913 - 930
  • [38] Evaluating Gender-Neutral Training Data for Automated Image Captioning
    Amend, Jack J.
    Wazzan, Albatool
    Souvenir, Richard
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 1226 - 1235
  • [39] A comprehensive literature review on image captioning methods and metrics based on deep learning technique
    Ahmad Sami Al-Shamayleh
    Omar Adwan
    Mohammad A. Alsharaiah
    Abdelrahman H. Hussein
    Qasem M. Kharma
    Christopher Ifeanyi Eke
    Multimedia Tools and Applications, 2024, 83 : 34219 - 34268
  • [40] A comprehensive literature review on image captioning methods and metrics based on deep learning technique
    Al-Shamayleh, Ahmad Sami
    Adwan, Omar
    Alsharaiah, Mohammad A.
    Hussein, Abdelrahman H.
    Kharma, Qasem M.
    Eke, Christopher Ifeanyi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (12) : 34219 - 34268