ENHANCING SEMANTIC WEB ENTITY MATCHING PROCESS USING TRANSFORMER NEURAL NETWORKS AND PRE-TRAINED LANGUAGE MODELS

被引:0
|
作者
Jabrane, Mourad [1 ]
Toulaoui, Abdelfattah [1 ]
Hafidi, Imad [1 ]
机构
[1] Sultan Moulay Slimane Univ, Lab Proc Engn Comp Sci & Math, Bd Beni Amir,BP 77, Khouribga, Morocco
关键词
Entity matching; record linkage; linked data; deep learning; transformer neural networks;
D O I
10.31577/cai202461397
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Entity matching (EM) is a critical yet complex component of data cleaning and integration. Recent advancements in EM have predominantly been driven by deep learning (DL) methods. These methods primarily enhance data accuracy within structured data that adheres to a high-quality and well-defined schema. However, these schema-centric DL strategies struggle with the semantic web's linked data, which tends to be voluminous, semi-structured, diverse, and often noisy. To tackle this, we introduce a novel approach that is loosely schema-aware and leverages cutting-edge developments in DL, specifically transformer neural networks and pre-trained language models. We evaluated our approach on six datasets, including two tabular and four RDF datasets from the semantic web. The findings demonstrate the effectiveness of our model in managing the complexities of noisy and varied data.
引用
收藏
页码:1397 / 1415
页数:19
相关论文
共 50 条
  • [1] Deep Entity Matching with Pre-Trained Language Models
    Li, Yuliang
    Li, Jinfeng
    Suhara, Yoshihiko
    Doan, AnHai
    Tan, Wang-Chiew
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 14 (01): : 50 - 60
  • [2] Probing the Robustness of Pre-trained Language Models for Entity Matching
    Rastaghi, Mehdi Akbarian
    Kamalloo, Ehsan
    Rafiei, Davood
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 3786 - 3790
  • [3] Schema-Agnostic Entity Matching using Pre-trained Language Models
    Teong, Kai-Sheng
    Soon, Lay-Ki
    Su, Tin Tin
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 2241 - 2244
  • [4] Interpretability of Entity Matching Based on Pre-trained Language Model
    Liang Z.
    Wang H.-Z.
    Dai J.-J.
    Shao X.-Y.
    Ding X.-O.
    Mu T.-Y.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (03): : 1087 - 1108
  • [5] SETEM: Self-ensemble training with Pre-trained Language Models for Entity Matching
    Ding, Huahua
    Dai, Chaofan
    Wu, Yahui
    Ma, Wubin
    Zhou, Haohao
    KNOWLEDGE-BASED SYSTEMS, 2024, 293
  • [6] JointMatcher: Numerically-aware entity matching using pre-trained language models with attention concentration
    Ye, Chen
    Jiang, Shihao
    Zhang, Hua
    Wu, Yifan
    Shi, Jiankai
    Wang, Hongzhi
    Dai, Guojun
    KNOWLEDGE-BASED SYSTEMS, 2022, 251
  • [7] Semantic Segmentation of Mammograms Using Pre-Trained Deep Neural Networks
    Prates, Rodrigo Leite
    Gomez-Flores, Wilfrido
    Pereira, Wagner
    2021 18TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, COMPUTING SCIENCE AND AUTOMATIC CONTROL (CCE 2021), 2021,
  • [8] Enhancing Turkish Sentiment Analysis Using Pre-Trained Language Models
    Koksal, Omer
    29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [9] Pre-trained transformer-based language models for Sundanese
    Wilson Wongso
    Henry Lucky
    Derwin Suhartono
    Journal of Big Data, 9
  • [10] Pre-trained transformer-based language models for Sundanese
    Wongso, Wilson
    Lucky, Henry
    Suhartono, Derwin
    JOURNAL OF BIG DATA, 2022, 9 (01)