Unsupervised Graph-Based Entity Resolution for Complex Entities

被引:5
|
作者
Kirielle, Nishadi [1 ]
Christen, Peter [1 ]
Ranbaduge, Thilina [1 ]
机构
[1] Australian Natl Univ, Sch Comp, Canberra, ACT 2600, Australia
关键词
Record linkage; data linkage; data cleaning; dependency graph; temporal data; ambiguity; INTEGRATION; WORLD;
D O I
10.1145/3533016
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Entity resolution (ER) is the process of linking records that refer to the same entity. Traditionally, this process compares attribute values of records to calculate similarities and then classifies pairs of records as referring to the same entity or not based on these similarities. Recently developed graph-based ER approaches combine relationships between records with attribute similarities to improve linkage quality. Most of these approaches only consider databases containing basic entities that have static attribute values and static relationships, such as publications in bibliographic databases. In contrast, temporal record linkage addresses the problem where attribute values of entities can change over time. However, neither existing graph-based ER nor temporal record linkage can achieve high linkage quality on databases with complex entities, where an entity (such as a person) can change its attribute values over time while having different relationships with other entities at different points in time. In this article, we propose an unsupervised graph-based ER framework that is aimed at linking records of complex entities. Our framework provides five key contributions. First, we propagate positive evidence encountered when linking records to use in subsequent links by propagating attribute values that have changed. Second, we employ negative evidence by applying temporal and link constraints to restrict which candidate record pairs to consider for linking. Third, we leverage the ambiguity of attribute values to disambiguate similar records that, however, belong to different entities. Fourth, we adaptively exploit the structure of relationships to link records that have different relationships. Fifth, using graph measures, we refine matched clusters of records by removing likely wrong links between records. We conduct extensive experiments on seven real-world datasets from different domains showing that on average our unsupervised graph-based ER framework can improve precision by up to 25% and recall by up to 29% compared to several state-of-the-art ER techniques.
引用
收藏
页数:30
相关论文
共 50 条
  • [1] ModER: Graph-based Unsupervised Entity Resolution using Composite Modularity Optimization and Locality Sensitive Hashing
    Ebeid, Islam Akef
    Talburt, John R.
    Hagan, Nicholas Kofi Akortia
    Siddique, Md Abdus Salam
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (09) : 1 - 18
  • [2] Unsupervised Entity Resolution With Blocking and Graph Algorithms
    Zhang, Dongxiang
    Li, Dongsheng
    Guo, Long
    Tan, Kian-Lee
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (03) : 1501 - 1515
  • [3] GB-JER: A Graph-Based Model for Joint Entity Resolution
    Sun, Chenchen
    Shen, Derong
    Kou, Yue
    Nie, Tiezheng
    Yu, Ge
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT1, 2015, 9049 : 458 - 473
  • [4] A Graph-Theoretic Fusion Framework for Unsupervised Entity Resolution
    Zhang, Dongxiang
    Guo, Long
    He, Xiangnan
    Shao, Jie
    Wu, Sai
    Shen, Heng Tao
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 713 - 724
  • [5] Graph-Based Named Entity Linking with Wikipedia
    Hachey, Ben
    Radford, Will
    Curran, James R.
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2011, 2011, 6997 : 213 - +
  • [6] Unsupervised Graph-based Discourse Planning and Generation
    Singh, Anjali
    Chatterjee, Niladri
    IETE TECHNICAL REVIEW, 2019, 36 (05) : 526 - 534
  • [7] GraphEDM: A Graph-Based Approach to Disambiguate Entities in Microposts
    Nerella, Prathyusha
    Bhardwaj, Akansha
    Rosso, Paolo
    Cudre-Mauroux, Philippe
    2021 8TH SWISS CONFERENCE ON DATA SCIENCE, SDS, 2021, : 20 - 25
  • [8] Graph-Based Joint Clustering of Fixations and Visual Entities
    Sugano, Yusuke
    Matsushita, Yasuyuki
    Sato, Yoichi
    ACM TRANSACTIONS ON APPLIED PERCEPTION, 2013, 10 (02)
  • [9] Graph-Based Bootstrapping for Coreference Resolution
    Balaji, J.
    Geetha, T.
    Ranjani, P.
    JOURNAL OF INTELLIGENT SYSTEMS, 2014, 23 (03) : 293 - 310
  • [10] Graph-Based Opinion Entity Ranking in Customer Reviews
    Chutmongkolporn, Kunuch
    Manaskasemsak, Bundit
    Rungsawang, Arnon
    2015 15TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT), 2015, : 161 - 164