Bias Unveiled: Enhancing Fairness in German Word Embeddings with Large Language Models

被引:0
|
作者
Saeid, Yasser [1 ]
Kopinski, Thomas [1 ]
机构
[1] South Westphalia Univ Appl Sci, Meschede, Germany
来源
关键词
Stereotypical biases; Gender bias; Machine learning systems; Word embedding algorithms; Bias amplification; Embedding bias; Origins of bias; Specific training documents; Efficacy; Abating bias; Methodology; Insights; Matrix; German Wikipedia corpora; Empirical endeavor; Precision; Sources of bias; Equanimity; Impartiality; LLM;
D O I
10.1007/978-3-031-78014-1_23
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Gender bias in word embedding algorithms has garnered significant attention due to its integration into machine learning systems and its potential to reinforce stereotypes. Despite ongoing efforts, the root causes of biases in training word embeddings, specifically for the German language, remain unclear. This research presents a novel approach to tackling this problem, paving the way for new avenues of investigation. Our methodology involves a comprehensive analysis of word embeddings, focusing on how training data manipulations impact resulting biases. By examining how biases originate within specific training documents, we identify subsets that can be removed to effectively mitigate these effects. Additionally, we explore both conventional methods and new approaches using large language models (LLMs) to ensure the generated text adheres to concepts of fairness. Using few-shot prompting, we generate gender bias-free text, employing GPT-4 as a benchmark to evaluate the fairness of this process for the German language. Our method explains the intricate origins of biases within word embeddings, validated through rigorous application to German Wikipedia corpora. Our findings robustly demonstrate the efficacy of our method, showing that removing certain document subsets significantly diminishes bias in word embeddings. This is further detailed in our analysis, "Unlocking the Limits: Document Removal with an Upper Bound," in the experimental results section. Ultimately, this research presents a practical framework to uncover and mitigate biases in word embedding algorithms during training. Our goal is to advance machine learning systems that prioritize fairness and impartiality by revealing and addressing latent sources of bias.
引用
收藏
页码:308 / 325
页数:18
相关论文
共 50 条
  • [31] Locating and Mitigating Gender Bias in Large Language Models
    Cai, Yuchen
    Cao, Ding
    Guo, Rongxi
    Wen, Yaqin
    Liu, Guiquan
    Chen, Enhong
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IV, ICIC 2024, 2024, 14878 : 471 - 482
  • [32] Do Large Language Models Bias Human Evaluations?
    O'Leary, Daniel E.
    IEEE INTELLIGENT SYSTEMS, 2024, 39 (04) : 83 - 87
  • [33] From Sentence Embeddings to Large Language Models to Detect and Understand Wordplay
    Dsilva, Ryan Rony
    EXPERIMENTAL IR MEETS MULTILINGUALITY, MULTIMODALITY, AND INTERACTION, PT I, CLEF 2024, 2024, 14958 : 205 - 214
  • [34] Enhancing Fairness in Financial AI Models through Constraint-Based Bias Mitigation
    Choi, Yiseul
    Hong, Jiwon
    Lee, Eunbeen
    Kim, Junga
    Kim, Seongmin
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2025, 21 (01): : 89 - 101
  • [35] Identifying and Reducing Gender Bias in Word-Level Language Models
    Bordia, Shikha
    Bowman, Samuel R.
    NAACL HLT 2019: THE 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2019, : 7 - 15
  • [36] Enhancing Biomedical Question Answering with Large Language Models
    Yang, Hua
    Li, Shilong
    Goncalves, Teresa
    INFORMATION, 2024, 15 (08)
  • [37] PharmaBench: Enhancing ADMET benchmarks with large language models
    Niu, Zhangming
    Xiao, Xianglu
    Wu, Wenfan
    Cai, Qiwei
    Jiang, Yinghui
    Jin, Wangzhen
    Wang, Minhao
    Yang, Guojian
    Kong, Lingkang
    Jin, Xurui
    Yang, Guang
    Chen, Hongming
    SCIENTIFIC DATA, 2024, 11 (01)
  • [38] Evaluating Bias and Fairness in Gender-Neutral Pretrained Vision-and-Language Models
    Cabello, Laura
    Bugliarello, Emanuele
    Brandl, Stephanie
    Elliott, Desmond
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 8465 - 8483
  • [39] The Two Word Test as a semantic benchmark for large language models
    Riccardi, Nicholas
    Yang, Xuan
    Desai, Rutvik H.
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [40] Correction: Leveraging large language models for word sense disambiguation
    Jung H. Yae
    Nolan C. Skelly
    Neil C. Ranly
    Phillip M. LaCasse
    Neural Computing and Applications, 2025, 37 (10) : 7449 - 7450