Divide and Imitate: Multi-cluster Identification and Mitigation of Selection Bias

被引:2
|
作者
Dost, Katharina [1 ]
Duncanson, Hamish [1 ]
Ziogas, Ioannis [2 ]
Riddle, Patricia [1 ]
Wicker, Jorg [1 ]
机构
[1] Univ Auckland, Auckland, New Zealand
[2] Univ Mississippi, Oxford, MS USA
关键词
D O I
10.1007/978-3-031-05936-0_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine Learning can help overcome human biases in decision making by focussing on purely logical conclusions based on the training data. If the training data is biased, however, that bias will be transferred to the model and remains undetected as the performance is validated on a test set drawn from the same biased distribution. Existing strategies for selection bias identification and mitigation generally rely on some sort of knowledge of the bias or the ground-truth. An exception is the Imitate algorithm that assumes no knowledge but comes with a strong limitation: It can only model datasets with one normally distributed cluster per class. In this paper, we introduce a novel algorithm, Mimic, which uses Imitate as a building block but relaxes this limitation. By allowing mixtures of multivariate Gaussians, our technique is able to model multi-cluster datasets and provide solutions for a substantially wider set of problems. Experiments confirm that Mimic not only identifies potential biases in multi-cluster datasets which can be corrected early on but also improves classifier performance.
引用
收藏
页码:149 / 160
页数:12
相关论文
共 50 条
  • [41] Performance analysis of interconnection networks for multi-cluster systems
    Javadi, B
    Abawajy, JH
    Akbari, MK
    COMPUTATIONAL SCIENCE - ICCS 2005, PT 3, 2005, 3516 : 205 - 212
  • [42] SU(3) APPROACH TO NUCLEAR MULTI-CLUSTER PROBLEMS
    HECHT, KT
    ZAHN, W
    NUCLEAR PHYSICS A, 1979, 318 (1-2) : 1 - 20
  • [43] Scalable algorithms for complete exchange on multi-cluster networks
    Goldman, A
    CCGRID 2002: 2ND IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, PROCEEDINGS, 2002, : 286 - 287
  • [44] Screening of serum exosome markers for colorectal cancer based on Boruta and multi-cluster feature selection algorithms
    Zhu, Jian
    Luo, Junjie
    Ma, Yao
    MOLECULAR & CELLULAR TOXICOLOGY, 2024, 20 (02) : 343 - 351
  • [45] Cooperative learning model of agents in multi-cluster grid
    Chen, Qingkui
    PROCEEDINGS OF THE 2007 11TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, VOLS 1 AND 2, 2007, : 418 - 423
  • [46] Design and implementation of an efficient multi-cluster GridRPC system
    Ho, QT
    Cai, WT
    Ong, YS
    2005 IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, VOLS 1 AND 2, 2005, : 358 - 365
  • [47] A high-throughput Multi-Cluster NoC architecture
    Freitas, Henrique C.
    Navaux, Philippe O. A.
    CSE 2008:11TH IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING, PROCEEDINGS, 2008, : 56 - 63
  • [48] Interference Utilization Precoding in Multi-Cluster IoT Networks
    Wang, Yuanchen
    Lim, Eng Gee
    Xue, Xiaoping
    Zhu, Guangyu
    Pei, Rui
    Wei, Zhongxiang
    FRONTIERS IN SIGNAL PROCESSING, 2021, 1
  • [49] A distributed hierarchical algorithm for multi-cluster constrained optimization
    Guo, Fanghong
    Wen, Changyun
    Mao, Jianfeng
    Li, Guoqi
    Song, Yong-Duan
    AUTOMATICA, 2017, 77 : 230 - 238
  • [50] Multi-cluster dynamics in coupled phase oscillator networks
    Ismail, Asma
    Ashwin, Peter
    DYNAMICAL SYSTEMS-AN INTERNATIONAL JOURNAL, 2015, 30 (01): : 122 - 135