Investigating Semi-Automatic Assessment of Data Sets Fairness by Means of Fuzzy Logic

被引:1
|
作者
Gallese, Chiara [1 ]
Scantamburlo, Teresa [2 ]
Manzoni, Luca [3 ]
Nobile, Marco S. [4 ,5 ]
机构
[1] Eindhoven Univ Technol, Dept Elect Engn, Eindhoven, Netherlands
[2] Ca Foscari Univ Venice, European Ctr Living Technol, Dept Environm Sci Informat & Stat, Venice, Italy
[3] Univ Trieste, Dept Math & Geosci, Trieste, Italy
[4] Ca Foscari Univ Venice, Dept Environm Sci Informat & Stat, Venice, Italy
[5] Eindhoven Univ Technol, Dept Ind Engn & Innovat Sci, Eindhoven, Netherlands
关键词
Data Bias; Fairness; Trustworthy Artificial Intelligence; Fuzzy Logic;
D O I
10.1109/CIBCB56990.2023.10264913
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Research has shown how data sets convey social bias in AI systems, especially those based on machine learning. A biased data set is not representative of reality and might contribute to perpetuate societal biases within the model. To tackle this problem, it is important to understand how to avoid biases, errors, and unethical practices while creating the data sets. In this work we offer a preliminary framework for the semi-automated evaluation of fairness in data sets, by combining statistical information about data with qualitative consideration. We address the issue of how much (un)fairness can be included in a data set used for machine learning research, focusing on classification issues. In order to provide guidance for the use of data sets in contexts of critical decision-making, such as health decisions, we identify six fundamental features (balance, numerosity, unevenness, compliance, quality, incompleteness) that could affect model fairness. We developed a rule-based approach based on fuzzy logic that combines these characteristics into a single score and enables a semi-automatic evaluation of a data set in algorithmic fairness research.
引用
收藏
页码:106 / 115
页数:10
相关论文
共 50 条
  • [21] Automatic, semi-automatic and manual validation of urban drainage data
    Branisavljevic, N.
    Prodanovic, D.
    Pavlovic, D.
    WATER SCIENCE AND TECHNOLOGY, 2010, 62 (05) : 1013 - 1021
  • [22] Semi-Automatic Generation of a Fuzzy Inference System in a Reshoring Context
    Adlemo, Anders
    Hilletofth, Per
    SPS2020, 2020, 13 : 599 - 609
  • [23] Semi-Automatic Assessment of Unrestrained Java']Java Code
    Insa, David
    Silva, Josep
    ITICSE'15: PROCEEDINGS OF THE 2015 ACM CONFERENCE ON INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, 2015, : 39 - 44
  • [24] A semi-automatic technique for selection of well-balanced photo sets
    Shiyoa, Hiroka
    Morishita, Naoko
    Itoh, Takayuki
    Hagita, Mariko
    2018 NICOGRAPH INTERNATIONAL (NICOINT 2018), 2018, : 45 - 48
  • [25] Semi-automatic semantic enrichment of raw sensor data
    Legeay, Nicolas
    Roantree, Mark
    Jones, Gareth J. F.
    O'Connor, Noel E.
    Smeaton, Alan F.
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2007: OTM 2007 WORKSHOPS, PT 1, PROCEEDINGS, 2007, 4805 : 13 - +
  • [26] A semi-automatic data integration process of heterogeneous databases
    Barbella, Marcello
    Tortora, Genoveffa
    PATTERN RECOGNITION LETTERS, 2023, 166 : 134 - 142
  • [27] Semi-automatic ontology alignment for geospatial data integration
    Cruz, IF
    Sunna, W
    Chaudhry, A
    GEOGRAPHIC INFORMATION SCIENCE, PROCEEDINGS, 2004, 3234 : 51 - 66
  • [28] Semi-automatic Spine Segmentation Method of CT Data
    Mateusiak, Malgorzata
    Mikolajczyk, Krzysztof
    MECHATRONICS 2019: RECENT ADVANCES TOWARDS INDUSTRY 4.0, 2020, 1044 : 29 - 35
  • [29] Data Mining Techniques for Semi-Automatic Signature Generation
    Tylman, Wojciech
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DEPENDABILITY OF COMPUTER SYSTEMS, 2009, : 210 - 217
  • [30] Principles and methods for automatic and semi-automatic tissue segmentation in MRI data
    Wang, Lei
    Chitiboi, Teodora
    Meine, Hans
    Guenther, Matthias
    Hahn, Horst K.
    MAGNETIC RESONANCE MATERIALS IN PHYSICS BIOLOGY AND MEDICINE, 2016, 29 (02) : 95 - 110