Automatic Diacritics Restoration for Modern Standard Arabic Text

被引:0
|
作者
Zayyan, Ayman A. [1 ]
Elmahdy, Mohamed [2 ,3 ]
Husni, Husniza Binti
Al Ja'am, Jihad M. [1 ]
机构
[1] Qatar Univ, Dept Comp Sci & Engn, Doha, Qatar
[2] German Univ Cairo, Faulty Media Engn & Technol, Cairo, Egypt
[3] Univ Utara Malaysia, Sch Comp, Coll Arts & Sci, Kedah, Malaysia
关键词
diacritization; vowelization; Arabic; text processing;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, the problem of missing diacritic marks in most of Arabic written resources is investigated. Our aim is to implement a scalable and extensible platform to automatically restore missing diacritic marks for Modern Standard Arabic text. Different rule-based and statistical techniques are proposed. These include: morphological analyzer-based, maximum likelihood estimate, and statistical n-gram models. Diacritization accuracy of each technique was evaluated based on Diacritic Error Rate (DER) and Word Error Rate (WER). The proposed platform includes helper tools for text preprocessing and encoding conversion. It yielded a WER of 7.1% and DER of 3.9%. When the case ending was ignored, the platform yielded a WER and DER of 5.1% and 2.7%, respectively.
引用
收藏
页码:221 / 225
页数:5
相关论文
共 50 条
  • [41] Diacritics Effect on Arabic Speech Recognition
    Sa’ed Abed
    Mohammad Alshayeji
    Sari Sultan
    Arabian Journal for Science and Engineering, 2019, 44 : 9043 - 9056
  • [42] Automatic translation of Arabic text-to-Arabic sign language
    Luqman, Hamzah
    Mahmoud, Sabri A.
    UNIVERSAL ACCESS IN THE INFORMATION SOCIETY, 2019, 18 (04) : 939 - 951
  • [43] Automatic translation of Arabic text-to-Arabic sign language
    Hamzah Luqman
    Sabri A. Mahmoud
    Universal Access in the Information Society, 2019, 18 : 939 - 951
  • [44] Diacritics Effect on Arabic Speech Recognition
    Abed, Sa'ed
    Alshayeji, Mohammad
    Sultan, Sari
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2019, 44 (11) : 9043 - 9056
  • [45] Automatic expandable large-scale sentiment lexicon of Modern Standard Arabic and Colloquial
    Ibrahim, Hossam S.
    Abdou, Sherif M.
    Gheith, Mervat
    2015 FIRST INTERNATIONAL CONFERENCE ON ARABIC COMPUTATIONAL LINGUISTICS (ACLING 2015): ADVANCES IN ARABIC COMPUTATIONAL LINGUISTICS, 2015, : 94 - 99
  • [46] Automatic recognition of the occlusives of standard Arabic
    Reconnaissance automatique des occlusives de l'arabe standard
    Bulot, R., 1600, Publ by Editions de Physique, Les Ulis, France (04):
  • [47] AUTOMATIC RECOGNITION OF OCCULSIVES IN STANDARD ARABIC
    BULOT, R
    BETARI, A
    JOURNAL DE PHYSIQUE IV, 1994, 4 (C5): : 481 - 484
  • [48] A student grammar of Modern Standard Arabic
    Kaye, Alan S.
    WORD-JOURNAL OF THE INTERNATIONAL LINGUISTIC ASSOCIATION, 2005, 56 (03): : 443 - 446
  • [49] Shared Arguments in Modern Standard Arabic
    Alotaibi, Yasir Hameed
    INTERNATIONAL JOURNAL OF ENGLISH LINGUISTICS, 2018, 8 (01) : 164 - 183
  • [50] FOCUS TRANSFORMATION OF MODERN STANDARD ARABIC
    ANSHEN, F
    SCHREIBER, PA
    LANGUAGE, 1968, 44 (04) : 792 - 797