SeMBlock: A semantic-aware meta-blocking approach for entity resolution

被引:0
|
作者
Javdani, Delaram [1 ]
Rahmani, Hossein [1 ]
Weiss, Gerhard [2 ]
机构
[1] Iran Univ Sci & Technol, Sch Comp Engn, Tehran, Iran
[2] Maastricht Univ, Dept Data Sci & Knowledge Engn, Maastricht, Netherlands
来源
关键词
Data matching; entity resolution; meta-blocking; word embedding; locality-sensitive hashing; semantic similarity; big data integration; ALGORITHM;
D O I
10.3233/IDT-200207
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Entity resolution refers to the process of identifying, matching, and integrating records belonging to unique entities in a data set. However, a comprehensive comparison across all pairs of records leads to quadratic matching complexity. Therefore, blocking methods are used to group similar entities into small blocks before the matching. Available blocking methods typically do not consider semantic relationships among records. In this paper, we propose a Semantic-aware Meta-Blocking approach called SeMBlock. SeMBlock considers the semantic similarity of records by applying locality-sensitive hashing (LSH) based on word embedding to achieve fast and reliable blocking in a large-scale data environment. To improve the quality of the blocks created, SeMBlock builds a weighted graph of semantically similar records and prunes the graph edges. We extensively compare SeMBlock with 16 existing blocking methods, using three real-world data sets. The experimental results show that SeMBlock significantly outperforms all 16 methods with respect to two relevant measures, F-measure and pair-quality measure. F-measure and pair-quality measure of SeMBlock are approximately 7% and 27%, respectively, higher than recently released blocking methods.
引用
收藏
页码:461 / 468
页数:8
相关论文
共 41 条
  • [1] Semantic-Aware Blocking for Entity Resolution
    Wang, Qing
    Cui, Mingyuan
    Liang, Huizhi
    2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 1468 - 1469
  • [2] Semantic-Aware Blocking for Entity Resolution
    Wang, Qing
    Cui, Mingyuan
    Liang, Huizhi
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (01) : 166 - 180
  • [3] BLAST: a Loosely Schema-aware Meta-blocking Approach for Entity Resolution
    Simonini, Giovanni
    Bergamaschi, Sonia
    Jagadish, H. V.
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (12): : 1173 - 1184
  • [4] GSM: A generalized approach to Supervised Meta-blocking for scalable entity resolution
    Gagliardelli, Luca
    Papadakis, George
    Simonini, Giovanni
    Bergamaschi, Sonia
    Palpanas, Themis
    INFORMATION SYSTEMS, 2024, 120
  • [5] Meta-Blocking: Taking Entity Resolution to the Next Level
    Papadakis, George
    Koutrika, Georgia
    Palpanas, Themis
    Nejdl, Wolfgang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (08) : 1946 - 1960
  • [6] Parallel meta-blocking for scaling entity resolution over big heterogeneous data
    Efthymiou, Vasilis
    Papadakis, George
    Papastefanatos, George
    Stefanidis, Kostas
    Palpanas, Themis
    INFORMATION SYSTEMS, 2017, 65 : 137 - 157
  • [7] Boosting the Efficiency of Large-Scale Entity Resolution with Enhanced Meta-Blocking
    Papadakis, George
    Papastefanatos, George
    Palpanas, Themis
    Koubarakis, Manolis
    BIG DATA RESEARCH, 2016, 6 : 43 - 63
  • [8] Parallel Meta-blocking: Realizing Scalable Entity Resolution over Large, Heterogeneous Data
    Efthymiou, Vasilis
    Papadakis, George
    Papastefanatos, George
    Stefanidis, Kostas
    Palpanas, Themis
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 411 - 420
  • [9] SAGE: Semantic-Aware Global Explanations for Named Entity Recognition
    Zugarini, Andrea
    Rigutini, Leonardo
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [10] A Blocking Scheme for Entity Resolution in the Semantic Web
    Costa, Gustavo de Assis
    Parente de Oliveira, Jose Maria
    IEEE 30TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS IEEE AINA 2016, 2016, : 1138 - 1145