A fair-multicluster approach to clustering of categorical data

被引:0
|
作者
Carlos Santos-Mangudo
Antonio J. Heras
机构
[1] Complutense University of Madrid,Financial and Actuarial Economics and Statistics Department
关键词
Clustering; Fairness; Fair clustering; Categorical data;
D O I
暂无
中图分类号
学科分类号
摘要
In the last few years, the need of preventing classification biases due to race, gender, social status, etc. has increased the interest in designing fair clustering algorithms. The main idea is to ensure that the output of a cluster algorithm is not biased towards or against specific subgroups of the population. There is a growing specialized literature on this topic, dealing with the problem of clustering numerical data bases. Nevertheless, to our knowledge, there are no previous papers devoted to the problem of fair clustering of pure categorical attributes. In this paper, we show that the Multicluster methodology proposed by Santos and Heras (Interdiscip J Inf Knowl Manag 15:227–246, 2020. https://doi.org/10.28945/4643) for clustering categorical data, can be modified in order to increase the fairness of the clusters. Of course, there is a trade-off between fairness and efficiency, so that an increase in the fairness objective usually leads to a loss of classification efficiency. Yet it is possible to reach a reasonable compromise between these goals, since the methodology proposed by Santos and Heras (2020) can be easily adapted in order to get homogeneous and fair clusters.
引用
收藏
页码:583 / 604
页数:21
相关论文
共 50 条
  • [31] Clustering Categorical Data Based on Representatives
    Aranganayagi, S.
    Thangavel, K.
    THIRD 2008 INTERNATIONAL CONFERENCE ON CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, VOL 1, PROCEEDINGS, 2008, : 599 - +
  • [32] Fuzzy rough clustering for categorical data
    Xu, Shuliang
    Liu, Shenglan
    Zhou, Jian
    Feng, Lin
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (11) : 3213 - 3223
  • [33] Clustering categorical data in projected spaces
    Mohamed Bouguessa
    Data Mining and Knowledge Discovery, 2015, 29 : 3 - 38
  • [34] Fuzzy rough clustering for categorical data
    Shuliang Xu
    Shenglan Liu
    Jian Zhou
    Lin Feng
    International Journal of Machine Learning and Cybernetics, 2019, 10 : 3213 - 3223
  • [35] Fuzzy clustering for categorical multivariate data
    Oh, CH
    Honda, K
    Ichihashi, H
    JOINT 9TH IFSA WORLD CONGRESS AND 20TH NAFIPS INTERNATIONAL CONFERENCE, PROCEEDINGS, VOLS. 1-5, 2001, : 2154 - 2159
  • [36] Efficiency Based Categorical Data Clustering
    Kalaivani, K.
    Raghavendra, A. P. V.
    2012 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2012, : 550 - 553
  • [37] Clustering From Categorical Data Sequences
    Crane, Harry
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2015, 110 (510) : 810 - 823
  • [38] Summarizing categorical data by clustering attributes
    Michael Mampaey
    Jilles Vreeken
    Data Mining and Knowledge Discovery, 2013, 26 : 130 - 173
  • [39] Summarizing categorical data by clustering attributes
    Mampaey, Michael
    Vreeken, Jilles
    DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 26 (01) : 130 - 173
  • [40] LIMBO: Scalable clustering of categorical data
    Andritsos, P
    Tsaparas, P
    Miller, RJ
    Sevcik, KC
    ADVANCES IN DATABASE TECHNOLOGY - EDBT 2004, PROCEEDINGS, 2004, 2992 : 123 - 146