Unsupervised learning blocking keys technique for indexing Arabic entity resolution

被引:0
|
作者
Marwah Alian
Arafat Awajan
Bandan Ramadan
机构
[1] Hashemite University,
[2] Princess Sumaya University for Technology,undefined
[3] Prince Sultan University,undefined
关键词
Arabic entity resolution; Learning keys; Indexing; Arabic datasets;
D O I
暂无
中图分类号
学科分类号
摘要
Attribute values in textual datasets are subjects of different types of errors due to the data entry processes such as typographical errors, pronunciation errors or dialects alterations. These errors make the entity resolution process more challenging. The iterative blocking indexing technique can be used for correcting this type of errors mainly in query access where the records are stored into more than one block. Blocking indexing technique selects a subset of object pairs saved in the same block for later detailed computation for similarity discarding other pairs in other blocks considering them as irrelevant. This work aims to solving such problems for Arabic texts. It proposes to adapt a specific model for learning blocking keys and analyze its performance for Arabic datasets. The resulted blocking keys are passed as blocking keys to Dynamic Aware Inverted Index (DySimII) that worked efficiently with Arabic datasets. The model is tested against a telephone book dataset that contains duplicates and errors in attribute values according to phonetic and typing errors. The results reach a matching accuracy of 84% for using learned keys with small number of corrupted attributes while the performance is declined with the increase of the number of corrupted attributes.
引用
收藏
页码:621 / 628
页数:7
相关论文
共 50 条
  • [1] Unsupervised learning blocking keys technique for indexing Arabic entity resolution
    Alian, Marwah
    Awajan, Arafat
    Ramadan, Bandan
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (03) : 621 - 628
  • [2] Unsupervised Entity Resolution With Blocking and Graph Algorithms
    Zhang, Dongxiang
    Li, Dongsheng
    Guo, Long
    Tan, Kian-Lee
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (03) : 1501 - 1515
  • [3] Arabic real time entity resolution using inverted indexing
    Marwah Alian
    Ghazi Al-Naymat
    Banda Ramadan
    Language Resources and Evaluation, 2020, 54 : 921 - 941
  • [4] Arabic real time entity resolution using inverted indexing
    Alian, Marwah
    Al-Naymat, Ghazi
    Ramadan, Banda
    LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (04) : 921 - 941
  • [5] Unsupervised Bootstrapping of Active Learning for Entity Resolution
    Primpeli, Anna
    Bizer, Christian
    Keuper, Margret
    SEMANTIC WEB (ESWC 2020), 2020, 12123 : 215 - 231
  • [6] Active Blocking Scheme Learning for Entity Resolution
    Shao, Jingyu
    Wang, Qing
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2018, PT II, 2018, 10938 : 350 - 362
  • [7] Unsupervised Blocking Key Selection for Real-Time Entity Resolution
    Ramadan, Banda
    Christen, Peter
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PART II, 2015, 9078 : 574 - 585
  • [8] A Deep-Learning-Based Blocking Technique for Entity Linkage
    Azzalini, Fabio
    Renzi, Marco
    Tanca, Letizia
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT I, 2020, 12112 : 553 - 569
  • [9] ENTITY RESOLUTION AND BLOCKING: A REVIEW
    Vidhya, K. A.
    Geetha, T. V.
    PROCEEDINGS OF THE 2019 IEEE 9TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (IACC 2019), 2019, : 133 - 140
  • [10] Entity Resolution with Recursive Blocking
    Yu Shao-Qing
    BIG DATA RESEARCH, 2020, 19-20 (19-20)