Simplification of Arabic text: A hybrid approach integrating machine translation and transformer-based lexical model

被引：3

作者：

Al-Thanyyan, Suha S. ^{[1
]}

Azmi, Aqil M. ^{[1
]}

机构：

[1] King Saud Univ, Coll Comp & Informat Sci, Dept Comp Sci, Riyadh 11543, Saudi Arabia

来源：

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES | 2023年 / 35卷 / 08期

关键词：

Text simplification; Arabic text simplification; Lexical simplification; Neural machine translation; Transformers; Arabic corpora; READABILITY;

D O I：

10.1016/j.jksuci.2023.101662

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The process of text simplification (TS) is crucial for enhancing the comprehension of written material, especially for people with low literacy levels and those who struggle to understand written content. In this study, we introduce the first automated approach to TS that combines word-level and sentencelevel simplification techniques for Arabic text. We employ three models: a neural machine translation model, an Arabic-BERT-based lexical model, and a hybrid model that combines both methods to simplify the text. To evaluate the models, we created and utilized two Arabic datasets, namely EW-SEW and WikiLarge, comprising 82,585 and 249 sentence pairs, respectively. As resources were scarce, we made these datasets available to other researchers. The EW-SEW dataset is a commonly used English TS corpus that aligns each sentence in the original English Wikipedia (EW) with a simpler reference sentence from Simple English Wikipedia (SEW). In contrast, the WikiLarge dataset has eight simplified reference sentences for each of the 249 test sentences. The hybrid model outperformed the other models, achieving a BLEU score of 55.68, a SARI score of 37.15, and an FBERT score of 86.7% on the WikiLarge dataset, demonstrating the effectiveness of the combined approach.(c) 2023 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

引用

页数：13

共 50 条

[21] Incorporating Source Syntax into Transformer-Based Neural Machine Translation
Currey, Anna
Heafield, Kenneth
FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 1: RESEARCH PAPERS, 2019, : 24 - 33
[22] Improving Transformer-Based Neural Machine Translation with Prior Alignments
Nguyen, Thien
Nguyen, Lam
Tran, Phuoc
Nguyen, Huu
COMPLEXITY, 2021, 2021
[23] Character-Level Transformer-Based Neural Machine Translation
Banar, Nikolay
Daelemans, Walter
Kestemont, Mike
2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020, 2020, : 149 - 156
[24] Arabic abstractive text summarization using RNN-based and transformer-based architectures
Bani-Almarjeh, Mohammad
Kurdy, Mohamad-Bassam
INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (02)
[25] Psychological disorder detection: A multimodal approach using a transformer-based hybrid model
Ghosh, Debadrita
Karande, Hema
Gite, Shilpa
Pradhan, Biswajeet
METHODSX, 2024, 13
[26] Robust Secret Data Hiding for Transformer-based Neural Machine Translation
Lu, Tianhe
Liu, Gongshen
Zhang, Ru
Li, Peixuan
Ju, Tianjie
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[27] RobuTrans: A Robust Transformer-Based Text-to-Speech Model
Li, Naihan
Liu, Yanqing
Wu, Yu
Liu, Shujie
Zhao, Sheng
Liu, Ming
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8228 - 8235
[28] Transcribing paralinguistic acoustic cues to target language text in transformer-based speech-to-text translation
Tokuyama, Hirotaka
Sakti, Sakriani
Sudoh, Katsuhito
Nakamura, Satoshi
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2021, 5 : 3976 - 3980
[29] Transcribing Paralinguistic Acoustic Cues to Target Language Text in Transformer-based Speech-to-Text Translation
Tokuyama, Hirotaka
Sakti, Sakriani
Sudoh, Katsuhito
Nakamura, Satoshi
INTERSPEECH 2021, 2021, : 2262 - 2266
[30] Neural Rule-Execution Tracking Machine For Transformer-Based Text Generation
Wang, Yufei
Xu, Can
Hu, Huang
Tao, Chongyang
Wan, Stephen
Dras, Mark
Johnson, Mark
Jiang, Daxin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34

← 1 2 3 4 5 →