Simplification of Arabic text: A hybrid approach integrating machine translation and transformer-based lexical model

被引：3

作者：

Al-Thanyyan, Suha S. ^{[1
]}

Azmi, Aqil M. ^{[1
]}

机构：

[1] King Saud Univ, Coll Comp & Informat Sci, Dept Comp Sci, Riyadh 11543, Saudi Arabia

来源：

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES | 2023年 / 35卷 / 08期

关键词：

Text simplification; Arabic text simplification; Lexical simplification; Neural machine translation; Transformers; Arabic corpora; READABILITY;

D O I：

10.1016/j.jksuci.2023.101662

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The process of text simplification (TS) is crucial for enhancing the comprehension of written material, especially for people with low literacy levels and those who struggle to understand written content. In this study, we introduce the first automated approach to TS that combines word-level and sentencelevel simplification techniques for Arabic text. We employ three models: a neural machine translation model, an Arabic-BERT-based lexical model, and a hybrid model that combines both methods to simplify the text. To evaluate the models, we created and utilized two Arabic datasets, namely EW-SEW and WikiLarge, comprising 82,585 and 249 sentence pairs, respectively. As resources were scarce, we made these datasets available to other researchers. The EW-SEW dataset is a commonly used English TS corpus that aligns each sentence in the original English Wikipedia (EW) with a simpler reference sentence from Simple English Wikipedia (SEW). In contrast, the WikiLarge dataset has eight simplified reference sentences for each of the 249 test sentences. The hybrid model outperformed the other models, achieving a BLEU score of 55.68, a SARI score of 37.15, and an FBERT score of 86.7% on the WikiLarge dataset, demonstrating the effectiveness of the combined approach.(c) 2023 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

引用

页数：13

共 50 条

[31] Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation
Raganato, Alessandro
Scherrer, Yves
Tiedemann, Jorg
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 556 - 568
[32] A Lexical-aware Non-autoregressive Transformer-based ASR Model
Lin, Chong-En
Chen, Kuan-Yu
INTERSPEECH 2023, 2023, : 1434 - 1438
[33] Transformer-Based Re-Ranking Model for Enhancing Contextual and Syntactic Translation in Low-Resource Neural Machine Translation
Javed, Arifa
Zan, Hongying
Mamyrbayev, Orken
Abdullah, Muhammad
Ahmed, Kanwal
Oralbekova, Dina
Dinara, Kassymova
Akhmediyarova, Ainur
ELECTRONICS, 2025, 14 (02):
[34] A Hybrid Transformer-Based Model for Optimizing Fake News Detection
Al-Quayed, Fatima
Javed, Danish
Jhanjhi, N. Z.
Humayun, Mamoona
Alnusairi, Thanaa S.
IEEE ACCESS, 2024, 12 : 160822 - 160834
[35] Transformer-Based Hybrid Forecasting Model for Multivariate Renewable Energy
Padilha, Guilherme Afonso Galindo
Ko, JeongRyun
Jung, Jason J.
Gomes de Mattos Neto, Paulo Salgado
APPLIED SCIENCES-BASEL, 2022, 12 (21):
[36] CRAN: An Hybrid CNN-RNN Attention-Based Model for Arabic Machine Translation
Bensalah, Nouhaila
Ayad, Habib
Adib, Abdellah
El Farouk, Abdelhamid Ibn
NETWORKING, INTELLIGENT SYSTEMS AND SECURITY, 2022, 237 : 87 - 102
[37] Integrating transformer-based machine learning with SERS technology for the analysis of hazardous pesticides in spinach
Hajikhani, Mehdi
Hegde, Akashata
Snyder, John
Cheng, Jianlin
Lin, Mengshi
JOURNAL OF HAZARDOUS MATERIALS, 2024, 470
[38] Cyberbullying Text Identification: A Deep Learning and Transformer-based Language Modeling Approach
Saifullah K.
Khan M.I.
Jamal S.
Sarker I.H.
EAI Endorsed Transactions on Industrial Networks and Intelligent Systems, 2024, 11 (01) : 1 - 12
[39] An Improved Transformer-Based Neural Machine Translation Strategy: Interacting-Head Attention
Li, Dongxing
Luo, Zuying
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
[40] Transformer-based Machine Translation for Low-resourced Languages embedded with Language Identification
Sefara, Tshephisho J.
Zwane, Skhumbuzo G.
Gama, Nelisiwe
Sibisi, Hlawulani
Senoamadi, Phillemon N.
Marivate, Vukosi
2021 CONFERENCE ON INFORMATION COMMUNICATIONS TECHNOLOGY AND SOCIETY (ICTAS), 2021, : 127 - 132

← 1 2 3 4 5 →