ENHANCING SEMANTIC WEB ENTITY MATCHING PROCESS USING TRANSFORMER NEURAL NETWORKS AND PRE-TRAINED LANGUAGE MODELS

被引：0

作者：

Jabrane, Mourad ^{[1
]}

Toulaoui, Abdelfattah ^{[1
]}

Hafidi, Imad ^{[1
]}

机构：

[1] Sultan Moulay Slimane Univ, Lab Proc Engn Comp Sci & Math, Bd Beni Amir,BP 77, Khouribga, Morocco

来源：

COMPUTING AND INFORMATICS | 2024年 / 43卷 / 06期

关键词：

Entity matching; record linkage; linked data; deep learning; transformer neural networks;

D O I：

10.31577/cai202461397

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Entity matching (EM) is a critical yet complex component of data cleaning and integration. Recent advancements in EM have predominantly been driven by deep learning (DL) methods. These methods primarily enhance data accuracy within structured data that adheres to a high-quality and well-defined schema. However, these schema-centric DL strategies struggle with the semantic web's linked data, which tends to be voluminous, semi-structured, diverse, and often noisy. To tackle this, we introduce a novel approach that is loosely schema-aware and leverages cutting-edge developments in DL, specifically transformer neural networks and pre-trained language models. We evaluated our approach on six datasets, including two tabular and four RDF datasets from the semantic web. The findings demonstrate the effectiveness of our model in managing the complexities of noisy and varied data.

引用

页码：1397 / 1415

页数：19

共 50 条

[1] Deep Entity Matching with Pre-Trained Language Models
Li, Yuliang
Li, Jinfeng
Suhara, Yoshihiko
Doan, AnHai
Tan, Wang-Chiew
PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 14 (01): : 50 - 60
[2] Probing the Robustness of Pre-trained Language Models for Entity Matching
Rastaghi, Mehdi Akbarian
Kamalloo, Ehsan
Rafiei, Davood
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 3786 - 3790
[3] Schema-Agnostic Entity Matching using Pre-trained Language Models
Teong, Kai-Sheng
Soon, Lay-Ki
Su, Tin Tin
CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 2241 - 2244
[4] Interpretability of Entity Matching Based on Pre-trained Language Model
Liang Z.
Wang H.-Z.
Dai J.-J.
Shao X.-Y.
Ding X.-O.
Mu T.-Y.
Ruan Jian Xue Bao/Journal of Software, 2023, 34 (03): : 1087 - 1108
[5] SETEM: Self-ensemble training with Pre-trained Language Models for Entity Matching
Ding, Huahua
Dai, Chaofan
Wu, Yahui
Ma, Wubin
Zhou, Haohao
KNOWLEDGE-BASED SYSTEMS, 2024, 293
[6] JointMatcher: Numerically-aware entity matching using pre-trained language models with attention concentration
Ye, Chen
Jiang, Shihao
Zhang, Hua
Wu, Yifan
Shi, Jiankai
Wang, Hongzhi
Dai, Guojun
KNOWLEDGE-BASED SYSTEMS, 2022, 251
[7] Semantic Segmentation of Mammograms Using Pre-Trained Deep Neural Networks
Prates, Rodrigo Leite
Gomez-Flores, Wilfrido
Pereira, Wagner
2021 18TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, COMPUTING SCIENCE AND AUTOMATIC CONTROL (CCE 2021), 2021,
[8] Enhancing Turkish Sentiment Analysis Using Pre-Trained Language Models
Koksal, Omer
29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
[9] Pre-trained transformer-based language models for Sundanese
Wilson Wongso
Henry Lucky
Derwin Suhartono
Journal of Big Data, 9
[10] Pre-trained transformer-based language models for Sundanese
Wongso, Wilson
Lucky, Henry
Suhartono, Derwin
JOURNAL OF BIG DATA, 2022, 9 (01)

← 1 2 3 4 5 →