Entity Resolution with Iterative Blocking

被引:0
|
作者
Whang, Steven Euijong [1 ]
Menestrina, David [1 ]
Koutrika, Georgia [1 ]
Theobald, Martin [1 ]
Garcia-Molina, Hector [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
关键词
entity resolution; blocking; iterative blocking;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Entity Resolution (ER) is the problem of identifying which records in a database refer to the same real-world entity. An exhaustive ER process involves computing the similarities between pairs of records, which can be very expensive for large datasets. Various blocking techniques can be used to enhance the performance of ER by dividing the records into blocks in multiple ways and only comparing records within the same block. However, most blocking techniques process blocks separately and do not exploit the results of other blocks. In this paper, we propose an iterative blocking framework where the ER results of blocks are reflected to subsequently processed blocks. Blocks are now iteratively processed until no block contains any more matching records. Compared to simple blocking, iterative blocking may achieve higher accuracy because reflecting the ER results of blocks to other blocks may generate additional record matches. Iterative blocking may also be more efficient because processing a block now saves the processing time for other blocks. We implement a scalable iterative blocking system and demonstrate that iterative blocking can be more accurate and efficient than blocking for large datasets.
引用
收藏
页码:219 / 231
页数:13
相关论文
共 50 条
  • [1] Web-scale Blocking, Iterative and Progressive Entity Resolution
    Stefanidis, Kostas
    Christophides, Vassilis
    Efthymiou, Vasilis
    2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 1459 - 1462
  • [2] ENTITY RESOLUTION AND BLOCKING: A REVIEW
    Vidhya, K. A.
    Geetha, T. V.
    PROCEEDINGS OF THE 2019 IEEE 9TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (IACC 2019), 2019, : 133 - 140
  • [3] Entity Resolution with Recursive Blocking
    Yu Shao-Qing
    BIG DATA RESEARCH, 2020, 19-20 (19-20)
  • [4] A Framework for Entity Resolution with Efficient Blocking
    Shu, Liangcai
    Lin, Can
    Meng, Weiyi
    Han, Yue
    Yu, Clement T.
    Smalheiser, Neil R.
    2012 IEEE 13TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2012, : 431 - 440
  • [5] BEER: Blocking for Effective Entity Resolution
    Galhotra, Sainyam
    Firmani, Donatella
    Saha, Barna
    Srivastava, Divesh
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 2711 - 2715
  • [6] A Survey on Blocking Technology of Entity Resolution
    Li, Bo-Han
    Liu, Yi
    Zhang, An-Man
    Wang, Wen-Huan
    Wan, Shuo
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2020, 35 (04) : 769 - 793
  • [7] A Survey on Blocking Technology of Entity Resolution
    Bo-Han Li
    Yi Liu
    An-Man Zhang
    Wen-Huan Wang
    Shuo Wan
    Journal of Computer Science and Technology, 2020, 35 : 769 - 793
  • [8] A Blocking Scheme for Entity Resolution in the Semantic Web
    Costa, Gustavo de Assis
    Parente de Oliveira, Jose Maria
    IEEE 30TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS IEEE AINA 2016, 2016, : 1138 - 1145
  • [9] Active Blocking Scheme Learning for Entity Resolution
    Shao, Jingyu
    Wang, Qing
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2018, PT II, 2018, 10938 : 350 - 362
  • [10] Semantic-Aware Blocking for Entity Resolution
    Wang, Qing
    Cui, Mingyuan
    Liang, Huizhi
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (01) : 166 - 180