Simplification of Arabic text: A hybrid approach integrating machine translation and transformer-based lexical model

被引：3

作者：

Al-Thanyyan, Suha S. ^{[1
]}

Azmi, Aqil M. ^{[1
]}

机构：

[1] King Saud Univ, Coll Comp & Informat Sci, Dept Comp Sci, Riyadh 11543, Saudi Arabia

来源：

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES | 2023年 / 35卷 / 08期

关键词：

Text simplification; Arabic text simplification; Lexical simplification; Neural machine translation; Transformers; Arabic corpora; READABILITY;

D O I：

10.1016/j.jksuci.2023.101662

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The process of text simplification (TS) is crucial for enhancing the comprehension of written material, especially for people with low literacy levels and those who struggle to understand written content. In this study, we introduce the first automated approach to TS that combines word-level and sentencelevel simplification techniques for Arabic text. We employ three models: a neural machine translation model, an Arabic-BERT-based lexical model, and a hybrid model that combines both methods to simplify the text. To evaluate the models, we created and utilized two Arabic datasets, namely EW-SEW and WikiLarge, comprising 82,585 and 249 sentence pairs, respectively. As resources were scarce, we made these datasets available to other researchers. The EW-SEW dataset is a commonly used English TS corpus that aligns each sentence in the original English Wikipedia (EW) with a simpler reference sentence from Simple English Wikipedia (SEW). In contrast, the WikiLarge dataset has eight simplified reference sentences for each of the 249 test sentences. The hybrid model outperformed the other models, achieving a BLEU score of 55.68, a SARI score of 37.15, and an FBERT score of 86.7% on the WikiLarge dataset, demonstrating the effectiveness of the combined approach.(c) 2023 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

引用

页数：13

共 50 条

[1] Multilingual Controllable Transformer-Based Lexical Simplification
Sheang, Kim Cheng
Saggion, Horacio
PROCESAMIENTO DEL LENGUAJE NATURAL, 2023, (71): : 109 - 123
[2] A Transformer-Based Neural Machine Translation Model for Arabic Dialects That Utilizes Subword Units
Baniata, Laith H.
Ampomah, Isaac K. E.
Park, Seyoung
SENSORS, 2021, 21 (19)
[3] A transformer-based approach for Arabic offline handwritten text recognition
Momeni, Saleh
Babaali, Bagher
SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (04) : 3053 - 3062
[4] A transformer-based approach for Arabic offline handwritten text recognition
Saleh Momeni
Bagher BabaAli
Signal, Image and Video Processing, 2024, 18 : 3053 - 3062
[5] Training and analyzing a Transformer-based machine translation model
Pimentel, Clovis Henrique Martins
Pires, Thiago Blanch
TEXTO LIVRE-LINGUAGEM E TECNOLOGIA, 2024, 17
[6] Transformer-Based Direct Hidden Markov Model for Machine Translation
Wang, Weiyue
Yang, Zijian
Gao, Yingbo
Ney, Hermann
ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2021, : 23 - 32
[7] Fusion of Image-text attention for Transformer-based Multimodal Machine Translation
Ma, Junteng
Qin, Shihao
Su, Lan
Li, Xia
Xiao, Lixian
PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 199 - 204
[8] Transformer-Based Unified Neural Network for Quality Estimation and Transformer-Based Re-decoding Model for Machine Translation
Chen, Cong
Zong, Qinqin
Luo, Qi
Qiu, Bailian
Li, Maoxi
MACHINE TRANSLATION, CCMT 2020, 2020, 1328 : 66 - 75
[9] Classifier Based Text Simplification for Improved Machine Translation
Tyagi, Shruti
Chopra, Deepti
Mathur, Iti
Joshi, Nisheeth
2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER ENGINEERING AND APPLICATIONS (ICACEA), 2015, : 46 - 50
[10] Learning Confidence for Transformer-based Neural Machine Translation
Lu, Yu
Zeng, Jiali
Zhang, Jiajun
Wu, Shuangzhi
Li, Mu
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 2353 - 2364

← 1 2 3 4 5 →