Unsupervised Graph-Based Entity Resolution for Complex Entities

被引:5
|
作者
Kirielle, Nishadi [1 ]
Christen, Peter [1 ]
Ranbaduge, Thilina [1 ]
机构
[1] Australian Natl Univ, Sch Comp, Canberra, ACT 2600, Australia
关键词
Record linkage; data linkage; data cleaning; dependency graph; temporal data; ambiguity; INTEGRATION; WORLD;
D O I
10.1145/3533016
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Entity resolution (ER) is the process of linking records that refer to the same entity. Traditionally, this process compares attribute values of records to calculate similarities and then classifies pairs of records as referring to the same entity or not based on these similarities. Recently developed graph-based ER approaches combine relationships between records with attribute similarities to improve linkage quality. Most of these approaches only consider databases containing basic entities that have static attribute values and static relationships, such as publications in bibliographic databases. In contrast, temporal record linkage addresses the problem where attribute values of entities can change over time. However, neither existing graph-based ER nor temporal record linkage can achieve high linkage quality on databases with complex entities, where an entity (such as a person) can change its attribute values over time while having different relationships with other entities at different points in time. In this article, we propose an unsupervised graph-based ER framework that is aimed at linking records of complex entities. Our framework provides five key contributions. First, we propagate positive evidence encountered when linking records to use in subsequent links by propagating attribute values that have changed. Second, we employ negative evidence by applying temporal and link constraints to restrict which candidate record pairs to consider for linking. Third, we leverage the ambiguity of attribute values to disambiguate similar records that, however, belong to different entities. Fourth, we adaptively exploit the structure of relationships to link records that have different relationships. Fifth, using graph measures, we refine matched clusters of records by removing likely wrong links between records. We conduct extensive experiments on seven real-world datasets from different domains showing that on average our unsupervised graph-based ER framework can improve precision by up to 25% and recall by up to 29% compared to several state-of-the-art ER techniques.
引用
收藏
页数:30
相关论文
共 50 条
  • [31] Unsupervised Visual Representation Learning by Graph-Based Consistent Constraints
    Li, Dong
    Hung, Wei-Chih
    Huang, Jia-Bin
    Wang, Shengjin
    Ahuja, Narendra
    Yang, Ming-Hsuan
    COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 : 678 - 694
  • [32] AGDISTIS - Graph-Based Disambiguation of Named Entities Using Linked Data
    Usbeck, Ricardo
    Ngomo, Axel-Cyrille Ngonga
    Roeder, Michael
    Gerber, Daniel
    Coelho, Sandro Athaide
    Auer, Soeren
    Both, Andreas
    SEMANTIC WEB - ISWC 2014, PT I, 2014, 8796 : 457 - 471
  • [33] Graph-based modeling and simulation of complex systems
    Jalving, Jordan
    Cao, Yankai
    Zavala, Victor M.
    COMPUTERS & CHEMICAL ENGINEERING, 2019, 125 : 134 - 154
  • [34] Event Prediction Based on Unsupervised Graph-Based Rank-Fusion Models
    Dourado, Icaro Cavalcante
    Tabbone, Salvatore
    Torres, Ricardo da Silva
    GRAPH-BASED REPRESENTATIONS IN PATTERN RECOGNITION, GBRPR 2019, 2019, 11510 : 88 - 98
  • [35] Leveraging graph-based hierarchical medical entity embedding for healthcare applications
    Tong Wu
    Yunlong Wang
    Yue Wang
    Emily Zhao
    Yilian Yuan
    Scientific Reports, 11
  • [36] Populating Knowledge Base with Collective Entity Mentions: A Graph-based Approach
    Lin, Hailun
    Jia, Yantao
    Wang, Yuanzhuo
    Jin, Xiaolong
    Li, Xiaojing
    Cheng, Xueqi
    2014 PROCEEDINGS OF THE IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2014), 2014, : 604 - 611
  • [37] Graph-Based Light Field Super-Resolution
    Rossi, Mattia
    Frossard, Pascal
    2017 IEEE 19TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2017,
  • [38] GAE- ISUMM: Unsupervised Graph-based Summarization for Indian Languages
    Vakada, Lakshmi Sireesha
    Ch, Anudeep
    Marreddy, Mounika
    Oota, Subba Reddy
    Mamidi, Radhika
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [39] Graph-based unsupervised feature selection and multiview clustering for microarray data
    Swarnkar, Tripti
    Mitra, Pabitra
    JOURNAL OF BIOSCIENCES, 2015, 40 (04) : 755 - 767
  • [40] A Graph-based Gaussian Component Clustering Approach to Unsupervised Acoustic Modeling
    Wang, Haipeng
    Lee, Tan
    Leung, Cheung-Chi
    Ma, Bin
    Li, Haizhou
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 875 - 879