Simplification of Arabic text: A hybrid approach integrating machine translation and transformer-based lexical model

被引:3
|
作者
Al-Thanyyan, Suha S. [1 ]
Azmi, Aqil M. [1 ]
机构
[1] King Saud Univ, Coll Comp & Informat Sci, Dept Comp Sci, Riyadh 11543, Saudi Arabia
关键词
Text simplification; Arabic text simplification; Lexical simplification; Neural machine translation; Transformers; Arabic corpora; READABILITY;
D O I
10.1016/j.jksuci.2023.101662
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The process of text simplification (TS) is crucial for enhancing the comprehension of written material, especially for people with low literacy levels and those who struggle to understand written content. In this study, we introduce the first automated approach to TS that combines word-level and sentencelevel simplification techniques for Arabic text. We employ three models: a neural machine translation model, an Arabic-BERT-based lexical model, and a hybrid model that combines both methods to simplify the text. To evaluate the models, we created and utilized two Arabic datasets, namely EW-SEW and WikiLarge, comprising 82,585 and 249 sentence pairs, respectively. As resources were scarce, we made these datasets available to other researchers. The EW-SEW dataset is a commonly used English TS corpus that aligns each sentence in the original English Wikipedia (EW) with a simpler reference sentence from Simple English Wikipedia (SEW). In contrast, the WikiLarge dataset has eight simplified reference sentences for each of the 249 test sentences. The hybrid model outperformed the other models, achieving a BLEU score of 55.68, a SARI score of 37.15, and an FBERT score of 86.7% on the WikiLarge dataset, demonstrating the effectiveness of the combined approach.(c) 2023 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation
    Raganato, Alessandro
    Scherrer, Yves
    Tiedemann, Jorg
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 556 - 568
  • [32] A Lexical-aware Non-autoregressive Transformer-based ASR Model
    Lin, Chong-En
    Chen, Kuan-Yu
    INTERSPEECH 2023, 2023, : 1434 - 1438
  • [33] Transformer-Based Re-Ranking Model for Enhancing Contextual and Syntactic Translation in Low-Resource Neural Machine Translation
    Javed, Arifa
    Zan, Hongying
    Mamyrbayev, Orken
    Abdullah, Muhammad
    Ahmed, Kanwal
    Oralbekova, Dina
    Dinara, Kassymova
    Akhmediyarova, Ainur
    ELECTRONICS, 2025, 14 (02):
  • [34] A Hybrid Transformer-Based Model for Optimizing Fake News Detection
    Al-Quayed, Fatima
    Javed, Danish
    Jhanjhi, N. Z.
    Humayun, Mamoona
    Alnusairi, Thanaa S.
    IEEE ACCESS, 2024, 12 : 160822 - 160834
  • [35] Transformer-Based Hybrid Forecasting Model for Multivariate Renewable Energy
    Padilha, Guilherme Afonso Galindo
    Ko, JeongRyun
    Jung, Jason J.
    Gomes de Mattos Neto, Paulo Salgado
    APPLIED SCIENCES-BASEL, 2022, 12 (21):
  • [36] CRAN: An Hybrid CNN-RNN Attention-Based Model for Arabic Machine Translation
    Bensalah, Nouhaila
    Ayad, Habib
    Adib, Abdellah
    El Farouk, Abdelhamid Ibn
    NETWORKING, INTELLIGENT SYSTEMS AND SECURITY, 2022, 237 : 87 - 102
  • [37] Integrating transformer-based machine learning with SERS technology for the analysis of hazardous pesticides in spinach
    Hajikhani, Mehdi
    Hegde, Akashata
    Snyder, John
    Cheng, Jianlin
    Lin, Mengshi
    JOURNAL OF HAZARDOUS MATERIALS, 2024, 470
  • [38] Cyberbullying Text Identification: A Deep Learning and Transformer-based Language Modeling Approach
    Saifullah K.
    Khan M.I.
    Jamal S.
    Sarker I.H.
    EAI Endorsed Transactions on Industrial Networks and Intelligent Systems, 2024, 11 (01) : 1 - 12
  • [39] An Improved Transformer-Based Neural Machine Translation Strategy: Interacting-Head Attention
    Li, Dongxing
    Luo, Zuying
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [40] Transformer-based Machine Translation for Low-resourced Languages embedded with Language Identification
    Sefara, Tshephisho J.
    Zwane, Skhumbuzo G.
    Gama, Nelisiwe
    Sibisi, Hlawulani
    Senoamadi, Phillemon N.
    Marivate, Vukosi
    2021 CONFERENCE ON INFORMATION COMMUNICATIONS TECHNOLOGY AND SOCIETY (ICTAS), 2021, : 127 - 132