Bias Unveiled: Enhancing Fairness in German Word Embeddings with Large Language Models

被引:0
|
作者
Saeid, Yasser [1 ]
Kopinski, Thomas [1 ]
机构
[1] South Westphalia Univ Appl Sci, Meschede, Germany
来源
关键词
Stereotypical biases; Gender bias; Machine learning systems; Word embedding algorithms; Bias amplification; Embedding bias; Origins of bias; Specific training documents; Efficacy; Abating bias; Methodology; Insights; Matrix; German Wikipedia corpora; Empirical endeavor; Precision; Sources of bias; Equanimity; Impartiality; LLM;
D O I
10.1007/978-3-031-78014-1_23
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Gender bias in word embedding algorithms has garnered significant attention due to its integration into machine learning systems and its potential to reinforce stereotypes. Despite ongoing efforts, the root causes of biases in training word embeddings, specifically for the German language, remain unclear. This research presents a novel approach to tackling this problem, paving the way for new avenues of investigation. Our methodology involves a comprehensive analysis of word embeddings, focusing on how training data manipulations impact resulting biases. By examining how biases originate within specific training documents, we identify subsets that can be removed to effectively mitigate these effects. Additionally, we explore both conventional methods and new approaches using large language models (LLMs) to ensure the generated text adheres to concepts of fairness. Using few-shot prompting, we generate gender bias-free text, employing GPT-4 as a benchmark to evaluate the fairness of this process for the German language. Our method explains the intricate origins of biases within word embeddings, validated through rigorous application to German Wikipedia corpora. Our findings robustly demonstrate the efficacy of our method, showing that removing certain document subsets significantly diminishes bias in word embeddings. This is further detailed in our analysis, "Unlocking the Limits: Document Removal with an Upper Bound," in the experimental results section. Ultimately, this research presents a practical framework to uncover and mitigate biases in word embedding algorithms during training. Our goal is to advance machine learning systems that prioritize fairness and impartiality by revealing and addressing latent sources of bias.
引用
收藏
页码:308 / 325
页数:18
相关论文
共 50 条
  • [1] Bias and Fairness in Large Language Models: A Survey
    Gallegos, Isabel O.
    Rossi, Ryan A.
    Barrow, Joe
    Tanjim, Md Mehrab
    Kim, Sungchul
    Dernoncourt, Franck
    Yu, Tong
    Zhang, Ruiyi
    Ahmed, Nesreen K.
    COMPUTATIONAL LINGUISTICS, 2024, 50 (03) : 1097 - 1179
  • [2] Utility of word embeddings from large language models in medical diagnosis
    Yazdani, Shahram
    Henry, Ronald Claude
    Byrne, Avery
    Henry, Isaac Claude
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2025, 32 (03) : 526 - 534
  • [3] Word Embeddings Are Steers for Language Models
    Han, Chi
    Xu, Jialiang
    Li, Manling
    Fung, Yi
    Sun, Chenkai
    Jiang, Nan
    Abdelzaher, Tarek
    Ji, Heng
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 16410 - 16430
  • [4] Dissecting word embeddings and language models in natural language processing
    Verma, Vivek Kumar
    Pandey, Mrigank
    Jain, Tarun
    Tiwari, Pradeep Kumar
    JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2021, 24 (05): : 1509 - 1515
  • [5] Fairness in Serving Large Language Models
    Sheng, Ying
    Cao, Shiyi
    Li, Dacheng
    Zhu, Banghua
    Li, Zhuohan
    Zhuo, Danyang
    Gonzalez, Joseph E.
    Stoica, Ion
    PROCEEDINGS OF THE 18TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2024, 2024, : 965 - 988
  • [6] On the Independence of Association Bias and Empirical Fairness in Language Models
    Cabello, Laura
    Jorgensen, Anna Katrine
    Sogaard, Anders
    PROCEEDINGS OF THE 6TH ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, FACCT 2023, 2023, : 370 - 378
  • [7] Improving Text Embeddings with Large Language Models
    Wang, Liang
    Yang, Nan
    Huang, Xiaolong
    Yang, Linjun
    Majumder, Rangan
    Wei, Furu
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 11897 - 11916
  • [8] Fairness identification of large language models in recommendation
    Liu, Wei
    Liu, Baisong
    Qin, Jiangcheng
    Zhang, Xueyuan
    Huang, Weiming
    Wang, Yangyang
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [9] Fairness Certification for Natural Language Processing and Large Language Models
    Freiberger, Vincent
    Buchmann, Erik
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, INTELLISYS 2024, 2024, 1065 : 606 - 624
  • [10] Simple Techniques for Enhancing Sentence Embeddings in Generative Language Models
    Zhang, Bowen
    Chang, Kehua
    Li, Chunping
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14877 : 52 - 64