Cross-lingual Transfer Can Worsen Bias in Sentiment Analysis

被引:0
|
作者
Goldfarb-Tarrant, Seraphina [1 ,2 ]
Ross, Bjorn [1 ]
Lopez, Adam [1 ]
机构
[1] Univ Edinburgh, Sch Informat, Edinburgh, Midlothian, Scotland
[2] Cohere, Toronto, ON, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sentiment analysis (SA) systems are widely deployed in many of the world's languages, and there is well-documented evidence of demographic bias in these systems. In languages beyond English, scarcer training data is often supplemented with transfer learning using pre-trained models, including multilingual models trained on other languages. In some cases, even supervision data comes from other languages. Does cross-lingual transfer also import new biases? To answer this question, we use counterfactual evaluation to test whether gender or racial biases are imported when using cross-lingual transfer, compared to a monolingual transfer setting. Across five languages, we find that systems using cross-lingual transfer usually become more biased than their monolingual counterparts. We also find racial biases to be much more prevalent than gender biases. To spur further research on this topic, we release the sentiment models we used for this study, and the intermediate checkpoints throughout training, yielding 1,525 distinct models; we also release our evaluation code.(1)
引用
收藏
页码:5691 / 5704
页数:14
相关论文
共 50 条
  • [41] Cross-Lingual Blog Analysis by Cross-Lingual Comparison of Characteristic Terms and Blog Posts
    Nakasaki, Hiroyuki
    Kawaba, Mariko
    Utsuro, Takehito
    Fukuhara, Tomohiro
    Nakagawa, Hiroshi
    Kando, Noriko
    PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON UNIVERSAL COMMUNICATION, 2008, : 105 - +
  • [42] Towards Cross-Lingual Generalization of Translation Gender Bias
    Cho, Won Ik
    Kim, Jiwon
    Yang, Jaeyeong
    Kim, Nam Soo
    PROCEEDINGS OF THE 2021 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, FACCT 2021, 2021, : 449 - 457
  • [43] Cross-lingual sense determination:: Can it work?
    Ide, N
    COMPUTERS AND THE HUMANITIES, 2000, 34 (1-2): : 223 - 234
  • [44] An Unsupervised Cross-Lingual Topic Model Framework for Sentiment Classification
    Lin, Zheng
    Jin, Xiaolong
    Xu, Xueke
    Wang, Yuanzhuo
    Cheng, Xueqi
    Wang, Weiping
    Meng, Dan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (03) : 432 - 444
  • [45] A cross-lingual sentiment topic model evolution over time
    Musa, Ibrahim Hussein
    Xu, Kang
    Liu, Feng
    Zamit, Ibrahim
    Abro, Waheed Ahmed
    Qi, Guilin
    INTELLIGENT DATA ANALYSIS, 2020, 24 (02) : 253 - 266
  • [46] Cross-Lingual Sentiment Classification with Bilingual Document Representation Learning
    Zhou, Xinjie
    Wan, Xianjun
    Xiao, Jianguo
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1403 - 1412
  • [47] Cross-Lingual Sense Determination: Can It Work?
    Nancy Ide
    Computers and the Humanities, 2000, 34 : 223 - 234
  • [48] Translation Artifacts in Cross-lingual Transfer Learning
    Artetxe, Mikel
    Labaka, Gorka
    Agirre, Eneko
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 7674 - 7684
  • [49] Cross-Lingual Knowledge Transfer for Clinical Phenotyping
    Papaioannou, Jens-Michalis
    Grundmann, Paul
    van Aken, Betty
    Samaras, Athanasios
    Kyparissidis, Ilias
    Giannakoulas, George
    Gers, Felix
    Loeser, Alexander
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 900 - 909
  • [50] Choosing Transfer Languages for Cross-Lingual Learning
    Lin, Yu-Hsiang
    Chen, Chian-Yu
    Lee, Jean
    Li, Zirui
    Zhang, Yuyan
    Xia, Mengzhou
    Rijhwani, Shruti
    He, Junxian
    Zhang, Zhisong
    Ma, Xuezhe
    Anastasopoulos, Antonios
    Littell, Patrick
    Neubig, Graham
    Anastasopoulos, Antonios
    Littell, Patrick
    Neubig, Graham
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3125 - 3135