TREATS: Fairness-aware entity resolution over streaming data

被引:0
|
作者
Araujo, Tiago Brasileiro [1 ,2 ]
Efthymiou, Vasilis [3 ,4 ]
Christophides, Vassilis [5 ]
Pitoura, Evaggelia [6 ]
Stefanidis, Kostas [1 ]
机构
[1] Tampere Univ, Tampere, Finland
[2] Fed Inst Paraiba, Soledade, Brazil
[3] Harokopio Univ Athens, Athens, Greece
[4] FORTH ICS, Iraklion, Greece
[5] ENSEA, ETIS, Paris, France
[6] Univ Ioannina, Ioannina, Greece
关键词
Entity resolution; Streaming data; Fairness; Incremental processing; Distributed processing; Machine learning;
D O I
10.1016/j.is.2024.102506
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Currently, the growing proliferation of information systems generates large volumes of data continuously, stemming from a variety of sources such as web platforms, social networks, and multiple devices. These data, often lacking a defined schema, require an initial process of consolidation and cleansing before analysis and knowledge extraction can occur. In this context, Entity Resolution (ER) plays a crucial role, facilitating the integration of knowledge bases and identifying similarities among entities from different sources. However, the traditional ER process is computationally expensive, and becomes more complicated in the streaming context where the data arrive continuously. Moreover, there is a lack of studies involving fairness and ER, which is related to the absence of discrimination or bias. In this sense, fairness criteria aim to mitigate the implications of data bias in ER systems, which requires more than just optimizing accuracy, as traditionally done. Considering this context, this work presents TREATS, a schema-agnostic and fairness-aware ER workflow developed for managing streaming data incrementally. The proposed fairness-aware ER framework tackles constraints across various groups of interest, presenting a resilient and equitable solution to the related challenges. Through experimental evaluation, the proposed techniques and heuristics are compared against state-of-the-art approaches over five real-world data source pairs, in which the results demonstrated significant improvements in terms of fairness, without degradation of effectiveness and efficiency measures in the streaming environment. In summary, our contributions aim to propel the ER field forward by providing a workflow that addresses both technical challenges and ethical concerns.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Fairness-aware Data Integration
    Mazilu, Lacramioara
    Paton, Norman W.
    Konstantinou, Nikolaos
    Fernandes, Alvaro A. A.
    ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2022, 14 (04):
  • [2] Considerations on Fairness-aware Data Mining
    Kamishima, Toshihiro
    Akaho, Shotaro
    Asoh, Hideki
    Sakuma, Jun
    12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012), 2012, : 378 - 385
  • [3] Empirical analysis of fairness-aware data segmentation
    Okura, Seiji
    Mohri, Takao
    2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW, 2022, : 155 - 162
  • [4] Fairness-Aware Programming
    Albarghouthi, Aws
    Vinitsky, Samuel
    FAT*'19: PROCEEDINGS OF THE 2019 CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, 2019, : 211 - 219
  • [5] Fairness-Aware PageRank
    Tsioutsiouliklis, Sotiris
    Pitoura, Evaggelia
    Tsaparas, Panayiotis
    Kleftakis, Ilias
    Mamoulis, Nikos
    PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021), 2021, : 3815 - 3826
  • [6] Fairness-Aware Explainable Recommendation over Knowledge Graphs
    Fu, Zuohui
    Xian, Yikun
    Gao, Ruoyuan
    Zhao, Jieyu
    Huang, Qiaoying
    Ge, Yingqiang
    Xu, Shuyuan
    Geng, Shijie
    Shah, Chirag
    Zhang, Yongfeng
    de Melo, Gerard
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 69 - 78
  • [7] A FAIRNESS-AWARE SMOOTH RATE ADAPTATION APPROACH FOR DYNAMIC HTTP STREAMING
    Liu, Li
    Zhou, Chao
    Zhang, Xinggong
    Guo, Zongming
    2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 4501 - 4505
  • [8] Tailoring Data Source Distributions for Fairness-aware Data Integration
    Nargesian, Fatemeh
    Asudeh, Abolfazl
    Jagadish, H., V
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 14 (11): : 2519 - 2532
  • [9] Fairness-Aware PAC Learning from Corrupted Data
    Konstantinov, Nikola
    Lampert, Christoph H.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [10] Fairness-Aware Range Queries for Selecting Unbiased Data
    Shetiya, Suraj
    Swift, Ian P.
    Asudeh, Abolfazl
    Das, Gautam
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 1423 - 1436