ENHANCING SEMANTIC WEB ENTITY MATCHING PROCESS USING TRANSFORMER NEURAL NETWORKS AND PRE-TRAINED LANGUAGE MODELS

被引:0
|
作者
Jabrane, Mourad [1 ]
Toulaoui, Abdelfattah [1 ]
Hafidi, Imad [1 ]
机构
[1] Sultan Moulay Slimane Univ, Lab Proc Engn Comp Sci & Math, Bd Beni Amir,BP 77, Khouribga, Morocco
关键词
Entity matching; record linkage; linked data; deep learning; transformer neural networks;
D O I
10.31577/cai202461397
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Entity matching (EM) is a critical yet complex component of data cleaning and integration. Recent advancements in EM have predominantly been driven by deep learning (DL) methods. These methods primarily enhance data accuracy within structured data that adheres to a high-quality and well-defined schema. However, these schema-centric DL strategies struggle with the semantic web's linked data, which tends to be voluminous, semi-structured, diverse, and often noisy. To tackle this, we introduce a novel approach that is loosely schema-aware and leverages cutting-edge developments in DL, specifically transformer neural networks and pre-trained language models. We evaluated our approach on six datasets, including two tabular and four RDF datasets from the semantic web. The findings demonstrate the effectiveness of our model in managing the complexities of noisy and varied data.
引用
收藏
页码:1397 / 1415
页数:19
相关论文
共 50 条
  • [21] Efficient Aspect Object Models Using Pre-trained Convolutional Neural Networks
    Wilkinson, Eric
    Takahashi, Takeshi
    2015 IEEE-RAS 15TH INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS (HUMANOIDS), 2015, : 284 - 289
  • [22] Enhancing pre-trained language models with Chinese character morphological knowledge
    Zheng, Zhenzhong
    Wu, Xiaoming
    Liu, Xiangzhi
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (01)
  • [23] Enhancing radiology report generation through pre-trained language models
    Leonardi, Giorgio
    Portinale, Luigi
    Santomauro, Andrea
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2024,
  • [24] Recent Progress on Named Entity Recognition Based on Pre-trained Language Models
    Yang, Binxia
    Luo, Xudong
    2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 799 - 804
  • [25] A Simple but Effective Pluggable Entity Lookup Table for Pre-trained Language Models
    Ye, Deming
    Lin, Yankai
    Li, Peng
    Sun, Maosong
    Liu, Zhiyuan
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): (SHORT PAPERS), VOL 2, 2022, : 523 - 529
  • [26] Somun: entity-centric summarization incorporating pre-trained language models
    Inan, Emrah
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (10): : 5301 - 5311
  • [27] Somun: entity-centric summarization incorporating pre-trained language models
    Emrah Inan
    Neural Computing and Applications, 2021, 33 : 5301 - 5311
  • [28] A graph-based blocking approach for entity matching using pre-trained contextual embedding models*
    Mugeni, John Bosco
    Amagasa, Toshiyuki
    37TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2022, : 357 - 364
  • [29] A Survey of Controllable Text Generation Using Transformer-based Pre-trained Language Models
    Zhang, Hanqing
    Song, Haolin
    Li, Shaoyu
    Zhou, Ming
    Song, Dawei
    ACM COMPUTING SURVEYS, 2024, 56 (03)
  • [30] Incident detection and classification in renewable energy news using pre-trained language models on deep neural networks
    Wang, Qiqing
    Li, Cunbin
    JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2022, 22 (01) : 57 - 76