Automatic Diacritics Restoration for Modern Standard Arabic Text

被引:0
|
作者
Zayyan, Ayman A. [1 ]
Elmahdy, Mohamed [2 ,3 ]
Husni, Husniza Binti
Al Ja'am, Jihad M. [1 ]
机构
[1] Qatar Univ, Dept Comp Sci & Engn, Doha, Qatar
[2] German Univ Cairo, Faulty Media Engn & Technol, Cairo, Egypt
[3] Univ Utara Malaysia, Sch Comp, Coll Arts & Sci, Kedah, Malaysia
关键词
diacritization; vowelization; Arabic; text processing;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, the problem of missing diacritic marks in most of Arabic written resources is investigated. Our aim is to implement a scalable and extensible platform to automatically restore missing diacritic marks for Modern Standard Arabic text. Different rule-based and statistical techniques are proposed. These include: morphological analyzer-based, maximum likelihood estimate, and statistical n-gram models. Diacritization accuracy of each technique was evaluated based on Diacritic Error Rate (DER) and Word Error Rate (WER). The proposed platform includes helper tools for text preprocessing and encoding conversion. It yielded a WER of 7.1% and DER of 3.9%. When the case ending was ignored, the platform yielded a WER and DER of 5.1% and 2.7%, respectively.
引用
收藏
页码:221 / 225
页数:5
相关论文
共 50 条
  • [21] Contour-based character segmentation for printed Arabic text with diacritics
    Mohammad, Khader
    Qaroush, Aziz
    Ayesh, Muna
    Washha, Mahdi
    Alsadeh, Ahmad
    Agaian, Sos
    JOURNAL OF ELECTRONIC IMAGING, 2019, 28 (04)
  • [22] Reading text with and without diacritics alters brain activation: The case of Arabic
    Hosam Al-Samarraie
    Samer Muthana Sarsam
    Ahmed Ibrahim Alzahrani
    Nasser Alalwan
    Current Psychology, 2020, 39 : 1189 - 1198
  • [23] Reading text with and without diacritics alters brain activation: The case of Arabic
    Al-Samarraie, Hosam
    Sarsam, Samer Muthana
    Alzahrani, Ahmed Ibrahim
    Alalwan, Nasser
    CURRENT PSYCHOLOGY, 2020, 39 (04) : 1189 - 1198
  • [24] An adaptive text-line extraction algorithm for printed Arabic documents with diacritics
    Mohammad, Khader
    Qaroush, Aziz
    Washha, Mahdi
    Agaian, Sos
    Tumar, Iyad
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (02) : 2177 - 2204
  • [25] Boosting the Capacity of Diacritics-Based Methods for Information Hiding in Arabic Text
    Mohammed Lahcen Bensaad
    Mohammed Bachir Yagoubi
    Arabian Journal for Science and Engineering, 2013, 38 : 2035 - 2041
  • [26] AUTOMATIC EXTRACTION OF PREPOSITIONS IN A CORPUS OF MODERN STANDARD ARABIC WRITTEN TEXTS
    Lancioni, Giuliano
    WORD IN ARABIC, 2011, 62 : 195 - 211
  • [27] Automatic Processing of Arabic Text
    Osman, Ziad
    Hamandi, Lama
    Zantout, Rached
    Sibai, Fadi N.
    2009 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION TECHNOLOGY, 2009, : 6 - +
  • [28] An adaptive text-line extraction algorithm for printed Arabic documents with diacritics
    Khader Mohammad
    Aziz Qaroush
    Mahdi Washha
    Sos Agaian
    Iyad Tumar
    Multimedia Tools and Applications, 2021, 80 : 2177 - 2204
  • [29] ARABIC DIACRITICS BASED STEGANOGRAPHY
    Aabed, Mohammed A.
    Awaideh, Sameh M.
    Elshafei, Abdul-Rahman M.
    Gutub, Adnan A.
    ICSPC: 2007 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, VOLS 1-3, PROCEEDINGS, 2007, : 756 - 759
  • [30] Towards including prosody in a text-to-speech system for modern standard Arabic
    Ramsay, Allan
    Mansour, Hanady
    COMPUTER SPEECH AND LANGUAGE, 2008, 22 (01): : 84 - 103