Unsupervised Graph-Based Entity Resolution for Complex Entities

被引:5
|
作者
Kirielle, Nishadi [1 ]
Christen, Peter [1 ]
Ranbaduge, Thilina [1 ]
机构
[1] Australian Natl Univ, Sch Comp, Canberra, ACT 2600, Australia
关键词
Record linkage; data linkage; data cleaning; dependency graph; temporal data; ambiguity; INTEGRATION; WORLD;
D O I
10.1145/3533016
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Entity resolution (ER) is the process of linking records that refer to the same entity. Traditionally, this process compares attribute values of records to calculate similarities and then classifies pairs of records as referring to the same entity or not based on these similarities. Recently developed graph-based ER approaches combine relationships between records with attribute similarities to improve linkage quality. Most of these approaches only consider databases containing basic entities that have static attribute values and static relationships, such as publications in bibliographic databases. In contrast, temporal record linkage addresses the problem where attribute values of entities can change over time. However, neither existing graph-based ER nor temporal record linkage can achieve high linkage quality on databases with complex entities, where an entity (such as a person) can change its attribute values over time while having different relationships with other entities at different points in time. In this article, we propose an unsupervised graph-based ER framework that is aimed at linking records of complex entities. Our framework provides five key contributions. First, we propagate positive evidence encountered when linking records to use in subsequent links by propagating attribute values that have changed. Second, we employ negative evidence by applying temporal and link constraints to restrict which candidate record pairs to consider for linking. Third, we leverage the ambiguity of attribute values to disambiguate similar records that, however, belong to different entities. Fourth, we adaptively exploit the structure of relationships to link records that have different relationships. Fifth, using graph measures, we refine matched clusters of records by removing likely wrong links between records. We conduct extensive experiments on seven real-world datasets from different domains showing that on average our unsupervised graph-based ER framework can improve precision by up to 25% and recall by up to 29% compared to several state-of-the-art ER techniques.
引用
收藏
页数:30
相关论文
共 50 条
  • [41] Leveraging graph-based hierarchical medical entity embedding for healthcare applications
    Wu, Tong
    Wang, Yunlong
    Wang, Yue
    Zhao, Emily
    Yuan, Yilian
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [42] Enhancement of Medical Named Entity Recognition Using Graph-based Features
    Keretna, Sara
    Lim, Chee Peng
    Creighton, Doug
    2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS, 2015, : 1895 - 1900
  • [43] An Unsupervised Approach for Graph-based Robust Clustering of Human Gait Signatures
    Tastan, Aylin
    Muma, Michael
    Zoubir, Abdelhak M.
    2020 IEEE RADAR CONFERENCE (RADARCONF20), 2020,
  • [44] NERank+: a graph-based approach for entity ranking in document collections
    Chengyu Wang
    Guomin Zhou
    Xiaofeng He
    Aoying Zhou
    Frontiers of Computer Science, 2018, 12 : 504 - 517
  • [45] Collective List-Only Entity Linking: A Graph-Based Approach
    Zeng, Weixin
    Zhao, Xiang
    Tang, Jiuyang
    Shang, Haichuan
    IEEE ACCESS, 2018, 6 : 16035 - 16045
  • [46] NERank plus : a graph-based approach for entity ranking in document collections
    Wang, Chengyu
    Zhou, Guomin
    He, Xiaofeng
    Zhou, Aoying
    FRONTIERS OF COMPUTER SCIENCE, 2018, 12 (03) : 504 - 517
  • [47] Adaptive graph-based generalized regression model for unsupervised feature selection
    Huang, Yanyong
    Shen, Zongxin
    Cai, Fuxu
    Li, Tianrui
    Lv, Fengmao
    KNOWLEDGE-BASED SYSTEMS, 2021, 227
  • [48] Pupil localization for gaze estimation using unsupervised graph-based model
    Rabba, Salah
    He, Yifeng
    Kyan, Matthew
    Guan, Ling
    2017 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2017,
  • [49] Unsupervised graph-based feature selection via subspace and pagerank centrality
    Henni, K.
    Mezghani, N.
    Gouin-Vallerand, C.
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 114 : 46 - 53
  • [50] Graph-Based Short Text Entity Linking: A Data Integration Perspective
    Ma, Bo
    Yang, Yating
    Zhou, Xi
    Wang, Lei
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2016, : 193 - 197