Entity Resolution with Iterative Blocking

被引:0
|
作者
Whang, Steven Euijong [1 ]
Menestrina, David [1 ]
Koutrika, Georgia [1 ]
Theobald, Martin [1 ]
Garcia-Molina, Hector [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
关键词
entity resolution; blocking; iterative blocking;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Entity Resolution (ER) is the problem of identifying which records in a database refer to the same real-world entity. An exhaustive ER process involves computing the similarities between pairs of records, which can be very expensive for large datasets. Various blocking techniques can be used to enhance the performance of ER by dividing the records into blocks in multiple ways and only comparing records within the same block. However, most blocking techniques process blocks separately and do not exploit the results of other blocks. In this paper, we propose an iterative blocking framework where the ER results of blocks are reflected to subsequently processed blocks. Blocks are now iteratively processed until no block contains any more matching records. Compared to simple blocking, iterative blocking may achieve higher accuracy because reflecting the ER results of blocks to other blocks may generate additional record matches. Iterative blocking may also be more efficient because processing a block now saves the processing time for other blocks. We implement a scalable iterative blocking system and demonstrate that iterative blocking can be more accurate and efficient than blocking for large datasets.
引用
收藏
页码:219 / 231
页数:13
相关论文
共 50 条
  • [41] Boosting the Efficiency of Large-Scale Entity Resolution with Enhanced Meta-Blocking
    Papadakis, George
    Papastefanatos, George
    Palpanas, Themis
    Koubarakis, Manolis
    BIG DATA RESEARCH, 2016, 6 : 43 - 63
  • [42] Parallel Meta-blocking: Realizing Scalable Entity Resolution over Large, Heterogeneous Data
    Efthymiou, Vasilis
    Papadakis, George
    Papastefanatos, George
    Stefanidis, Kostas
    Palpanas, Themis
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 411 - 420
  • [43] A Multi-Pass Blocking Based Pay-as-you-go Entity Resolution Approach
    Sun C.-C.
    Shen D.-R.
    Kou Y.
    Nie T.-Z.
    Yu G.
    Jisuanji Xuebao/Chinese Journal of Computers, 2019, 42 (08): : 1704 - 1720
  • [44] Overlapped Hashing: A Novel Scalable Blocking Technique for Entity Resolution in Big-Data Era
    Khalil, Rana
    Shawish, Ahmed
    Elzanfaly, Doaa
    INTELLIGENT COMPUTING, VOL 1, 2019, 858 : 427 - 441
  • [45] Simplifying Entity Resolution on Web Data with Schema-agnostic, Non-iterative Matching
    Efthymiou, Vasilis
    Papadakis, George
    Stefanidis, Kostas
    Christophides, Vassilis
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1296 - 1299
  • [46] A Type-Based Blocking Technique for Efficient Entity Resolution over Large-Scale Data
    Zhu, Hui-Juan
    Zhu, Zheng-Wei
    Jiang, Tong-Hai
    Cheng, Li
    Shi, Wei-Lei
    Zhou, Xi
    Zhao, Fan
    Ma, Bo
    JOURNAL OF SENSORS, 2018, 2018
  • [47] Provenance for Entity Resolution
    Oppold, Sarah
    Herschel, Melanie
    PROVENANCE AND ANNOTATION OF DATA AND PROCESSES, IPAW 2018, 2018, 11017 : 226 - 230
  • [48] Joint Entity Resolution
    Whang, Steven Euijong
    Garcia-Molina, Hector
    2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 294 - 305
  • [49] Skyblocking for entity resolution
    Shao, Jingyu
    Wang, Qing
    Lin, Yu
    INFORMATION SYSTEMS, 2019, 85 : 30 - 43
  • [50] Geospatial Entity Resolution
    Balsebre, Pasquale
    Yao, Dezhong
    Cong, Gao
    Hai, Zhen
    PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, : 3061 - 3070