Automatic Diacritics Restoration for Modern Standard Arabic Text

被引:0
|
作者
Zayyan, Ayman A. [1 ]
Elmahdy, Mohamed [2 ,3 ]
Husni, Husniza Binti
Al Ja'am, Jihad M. [1 ]
机构
[1] Qatar Univ, Dept Comp Sci & Engn, Doha, Qatar
[2] German Univ Cairo, Faulty Media Engn & Technol, Cairo, Egypt
[3] Univ Utara Malaysia, Sch Comp, Coll Arts & Sci, Kedah, Malaysia
关键词
diacritization; vowelization; Arabic; text processing;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, the problem of missing diacritic marks in most of Arabic written resources is investigated. Our aim is to implement a scalable and extensible platform to automatically restore missing diacritic marks for Modern Standard Arabic text. Different rule-based and statistical techniques are proposed. These include: morphological analyzer-based, maximum likelihood estimate, and statistical n-gram models. Diacritization accuracy of each technique was evaluated based on Diacritic Error Rate (DER) and Word Error Rate (WER). The proposed platform includes helper tools for text preprocessing and encoding conversion. It yielded a WER of 7.1% and DER of 3.9%. When the case ending was ignored, the platform yielded a WER and DER of 5.1% and 2.7%, respectively.
引用
收藏
页码:221 / 225
页数:5
相关论文
共 50 条
  • [1] AUTOMATIC RESTORATION OF ARABIC DIACRITICS: A SIMPLE, PURELY STATISTICAL APPROACH
    Alghamdi, Mansour
    Muzaffar, Zeeshan
    Alhakami, Hazim
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2010, 35 (2C) : 125 - 135
  • [2] Open Vocabulary Arabic Diacritics Restoration
    Hifny, Yasser
    IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (10) : 1421 - 1425
  • [3] Diacritics restoration for Arabic dialect texts
    Harrat, S.
    Abbas, M.
    Meftouh, K.
    Smaili, K.
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1428 - 1432
  • [4] RECENT ADVANCES IN ARABIC SYNTACTIC DIACRITICS RESTORATION
    Hifny, Yasser
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7768 - 7772
  • [5] Restoration of Arabic Diacritics using Dynamic Programming
    Hifny, Yasser
    2013 8TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2013, : 3 - 8
  • [6] Maximum Entropy Based Restoration of Arabic Diacritics
    Zitouni, Imed
    Sorensen, Jeffrey S.
    Sarikaya, Ruhi
    COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 577 - 584
  • [7] Variation in the Use of Diacritics in Modern Typeset Standard Arabic: A Theoretical and Descriptive Framework
    Hallberg, Andreas
    ARABICA, 2022, 69 (03) : 279 - 317
  • [8] Automatic Diacritics Restoration for Tunisian Dialect
    Masmoudi, Abir
    Mdhaffar, Salima
    Sellami, Rahma
    Belguith, Lamia Hadrich
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2019, 18 (03)
  • [9] Automatic Restoration of Diacritics for Igbo Language
    Ezeani, Ignatius
    Hepple, Mark
    Onyenwe, Ikechukwu
    TEXT, SPEECH, AND DIALOGUE, 2016, 9924 : 198 - 205
  • [10] Automatic Readability Prediction for Modern Standard Arabic
    Forsyth, Jonathan
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,