Unsupervised Graph-Based Entity Resolution for Complex Entities

被引:5
|
作者
Kirielle, Nishadi [1 ]
Christen, Peter [1 ]
Ranbaduge, Thilina [1 ]
机构
[1] Australian Natl Univ, Sch Comp, Canberra, ACT 2600, Australia
关键词
Record linkage; data linkage; data cleaning; dependency graph; temporal data; ambiguity; INTEGRATION; WORLD;
D O I
10.1145/3533016
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Entity resolution (ER) is the process of linking records that refer to the same entity. Traditionally, this process compares attribute values of records to calculate similarities and then classifies pairs of records as referring to the same entity or not based on these similarities. Recently developed graph-based ER approaches combine relationships between records with attribute similarities to improve linkage quality. Most of these approaches only consider databases containing basic entities that have static attribute values and static relationships, such as publications in bibliographic databases. In contrast, temporal record linkage addresses the problem where attribute values of entities can change over time. However, neither existing graph-based ER nor temporal record linkage can achieve high linkage quality on databases with complex entities, where an entity (such as a person) can change its attribute values over time while having different relationships with other entities at different points in time. In this article, we propose an unsupervised graph-based ER framework that is aimed at linking records of complex entities. Our framework provides five key contributions. First, we propagate positive evidence encountered when linking records to use in subsequent links by propagating attribute values that have changed. Second, we employ negative evidence by applying temporal and link constraints to restrict which candidate record pairs to consider for linking. Third, we leverage the ambiguity of attribute values to disambiguate similar records that, however, belong to different entities. Fourth, we adaptively exploit the structure of relationships to link records that have different relationships. Fifth, using graph measures, we refine matched clusters of records by removing likely wrong links between records. We conduct extensive experiments on seven real-world datasets from different domains showing that on average our unsupervised graph-based ER framework can improve precision by up to 25% and recall by up to 29% compared to several state-of-the-art ER techniques.
引用
收藏
页数:30
相关论文
共 50 条
  • [21] Graph-based reference table construction to facilitate entity matching
    Wang, Fangda
    Wang, Hongzhi
    Li, Jianzhong
    Gao, Hong
    JOURNAL OF SYSTEMS AND SOFTWARE, 2013, 86 (06) : 1679 - 1688
  • [22] A Review of Graph-Based Models for Entity-Oriented Search
    Devezas J.
    Nunes S.
    SN Computer Science, 2021, 2 (6)
  • [23] Collective Entity Linking in Web Text: A Graph-Based Method
    Han, Xianpei
    Sun, Le
    Zhao, Jun
    PROCEEDINGS OF THE 34TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR'11), 2011, : 765 - 774
  • [24] Unsupervised graph-based pattern extraction for multilingual emotion classification
    Saravia, Elvis
    Argueta, Carlos
    Chen, Yi-Shin
    SOCIAL NETWORK ANALYSIS AND MINING, 2016, 6 (01)
  • [25] TISSUE SEGMENTATION AND CLASSIFICATION USING GRAPH-BASED UNSUPERVISED CLUSTERING
    Margolis, Daniel
    Santamaria-Pang, Alberto
    Rittscher, Jens
    2012 9TH IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI), 2012, : 162 - 165
  • [26] An Experimental Study on Unsupervised Graph-based Word Sense Disambiguation
    Tsatsaronis, George
    Varlamis, Iraklis
    Norvag, Kjetil
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2010, 6008 : 184 - +
  • [27] Graph-based methods for unsupervised and semi-supervised learning
    Saul, LK
    2005 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2005, : 3 - 3
  • [28] Identifying Concepts on Specific Domain by a Unsupervised Graph-Based Approach
    Rojas-Lopez, Franco
    Lopez-Arevalo, Ivan
    Sosa-Sosa, Victor
    DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 2012, 151 : 477 - 484
  • [29] GUBS: Graph-Based Unsupervised Brain Segmentation in MRI Images
    Mayala, Simeon
    Herdlevaer, Ida
    Haugsoen, Jonas Bull
    Anandan, Shamundeeswari
    Blaser, Nello
    Gavasso, Sonia
    Brun, Morten
    JOURNAL OF IMAGING, 2022, 8 (10)
  • [30] Unsupervised video summarization with adversarial graph-based attention network
    Gunuganti, Jeshmitha
    Yeh, Zhi-Ting
    Wang, Jenq-Haur
    Norouzi, Mehdi
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 102