Automatic Diacritics Restoration for Modern Standard Arabic Text

被引:0
|
作者
Zayyan, Ayman A. [1 ]
Elmahdy, Mohamed [2 ,3 ]
Husni, Husniza Binti
Al Ja'am, Jihad M. [1 ]
机构
[1] Qatar Univ, Dept Comp Sci & Engn, Doha, Qatar
[2] German Univ Cairo, Faulty Media Engn & Technol, Cairo, Egypt
[3] Univ Utara Malaysia, Sch Comp, Coll Arts & Sci, Kedah, Malaysia
关键词
diacritization; vowelization; Arabic; text processing;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, the problem of missing diacritic marks in most of Arabic written resources is investigated. Our aim is to implement a scalable and extensible platform to automatically restore missing diacritic marks for Modern Standard Arabic text. Different rule-based and statistical techniques are proposed. These include: morphological analyzer-based, maximum likelihood estimate, and statistical n-gram models. Diacritization accuracy of each technique was evaluated based on Diacritic Error Rate (DER) and Word Error Rate (WER). The proposed platform includes helper tools for text preprocessing and encoding conversion. It yielded a WER of 7.1% and DER of 3.9%. When the case ending was ignored, the platform yielded a WER and DER of 5.1% and 2.7%, respectively.
引用
收藏
页码:221 / 225
页数:5
相关论文
共 50 条
  • [31] Boosting the Capacity of Diacritics-Based Methods for Information Hiding in Arabic Text
    Bensaad, Mohammed Lahcen
    Yagoubi, Mohammed Bachir
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2013, 38 (08) : 2035 - 2041
  • [32] Toward Robust Arabic AI-Generated Text Detection: Tackling Diacritics Challenges
    Alshammari, Hamed
    Elleithy, Khaled
    INFORMATION, 2024, 15 (07)
  • [33] Localization in Modern Standard Arabic
    Abdelali, A
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2004, 55 (01): : 23 - 28
  • [34] MODERN STANDARD ARABIC AND COLLOQUIALS
    KAYE, AS
    LINGUA, 1970, 24 (04) : 374 - &
  • [35] Automatic restoration of diacritics based on word n-grams for Slovak texts
    Toth, Stefan
    Zaymus, Emanuel
    Duracik, Michal
    Mesko, Matej
    Hrkut, Patrik
    2019 IEEE 15TH INTERNATIONAL SCIENTIFIC CONFERENCE ON INFORMATICS (INFORMATICS 2019), 2019, : 243 - 248
  • [36] Learning-free, divide and conquer text-line extraction algorithm for printed Arabic text with diacritics
    Qaroush, Aziz
    Awad, Abdalkarim
    Hanani, Abualsoud
    Mohammad, Khader
    Jaber, Basam
    Hasheesh, Ala
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (09) : 7699 - 7709
  • [37] Automatic Arabic text summarization: a survey
    Asma Bader Al-Saleh
    Mohamed El Bachir Menai
    Artificial Intelligence Review, 2016, 45 : 203 - 234
  • [38] Automatic Arabic text summarization: a survey
    Al-Saleh, Asma Bader
    Menai, Mohamed El Bachir
    ARTIFICIAL INTELLIGENCE REVIEW, 2016, 45 (02) : 203 - 234
  • [39] Automatic Correction of Arabic Dyslexic Text
    Alamri, Maha M.
    Teahan, William J.
    COMPUTERS, 2019, 8 (01)
  • [40] Comparative Study for Text Chunking Using Deep Learning: Case of Modern Standard Arabic
    Khoufi, Nabil
    Aloulou, Chafik
    COMPUTACION Y SISTEMAS, 2024, 28 (02): : 517 - 527