Simplification of Arabic text: A hybrid approach integrating machine translation and transformer-based lexical model

被引:3
|
作者
Al-Thanyyan, Suha S. [1 ]
Azmi, Aqil M. [1 ]
机构
[1] King Saud Univ, Coll Comp & Informat Sci, Dept Comp Sci, Riyadh 11543, Saudi Arabia
关键词
Text simplification; Arabic text simplification; Lexical simplification; Neural machine translation; Transformers; Arabic corpora; READABILITY;
D O I
10.1016/j.jksuci.2023.101662
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The process of text simplification (TS) is crucial for enhancing the comprehension of written material, especially for people with low literacy levels and those who struggle to understand written content. In this study, we introduce the first automated approach to TS that combines word-level and sentencelevel simplification techniques for Arabic text. We employ three models: a neural machine translation model, an Arabic-BERT-based lexical model, and a hybrid model that combines both methods to simplify the text. To evaluate the models, we created and utilized two Arabic datasets, namely EW-SEW and WikiLarge, comprising 82,585 and 249 sentence pairs, respectively. As resources were scarce, we made these datasets available to other researchers. The EW-SEW dataset is a commonly used English TS corpus that aligns each sentence in the original English Wikipedia (EW) with a simpler reference sentence from Simple English Wikipedia (SEW). In contrast, the WikiLarge dataset has eight simplified reference sentences for each of the 249 test sentences. The hybrid model outperformed the other models, achieving a BLEU score of 55.68, a SARI score of 37.15, and an FBERT score of 86.7% on the WikiLarge dataset, demonstrating the effectiveness of the combined approach.(c) 2023 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Incorporating Source Syntax into Transformer-Based Neural Machine Translation
    Currey, Anna
    Heafield, Kenneth
    FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 1: RESEARCH PAPERS, 2019, : 24 - 33
  • [22] Improving Transformer-Based Neural Machine Translation with Prior Alignments
    Nguyen, Thien
    Nguyen, Lam
    Tran, Phuoc
    Nguyen, Huu
    COMPLEXITY, 2021, 2021
  • [23] Character-Level Transformer-Based Neural Machine Translation
    Banar, Nikolay
    Daelemans, Walter
    Kestemont, Mike
    2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020, 2020, : 149 - 156
  • [24] Arabic abstractive text summarization using RNN-based and transformer-based architectures
    Bani-Almarjeh, Mohammad
    Kurdy, Mohamad-Bassam
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (02)
  • [25] Psychological disorder detection: A multimodal approach using a transformer-based hybrid model
    Ghosh, Debadrita
    Karande, Hema
    Gite, Shilpa
    Pradhan, Biswajeet
    METHODSX, 2024, 13
  • [26] Robust Secret Data Hiding for Transformer-based Neural Machine Translation
    Lu, Tianhe
    Liu, Gongshen
    Zhang, Ru
    Li, Peixuan
    Ju, Tianjie
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [27] RobuTrans: A Robust Transformer-Based Text-to-Speech Model
    Li, Naihan
    Liu, Yanqing
    Wu, Yu
    Liu, Shujie
    Zhao, Sheng
    Liu, Ming
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8228 - 8235
  • [28] Transcribing paralinguistic acoustic cues to target language text in transformer-based speech-to-text translation
    Tokuyama, Hirotaka
    Sakti, Sakriani
    Sudoh, Katsuhito
    Nakamura, Satoshi
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2021, 5 : 3976 - 3980
  • [29] Transcribing Paralinguistic Acoustic Cues to Target Language Text in Transformer-based Speech-to-Text Translation
    Tokuyama, Hirotaka
    Sakti, Sakriani
    Sudoh, Katsuhito
    Nakamura, Satoshi
    INTERSPEECH 2021, 2021, : 2262 - 2266
  • [30] Neural Rule-Execution Tracking Machine For Transformer-Based Text Generation
    Wang, Yufei
    Xu, Can
    Hu, Huang
    Tao, Chongyang
    Wan, Stephen
    Dras, Mark
    Johnson, Mark
    Jiang, Daxin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34