Effect of Stemming on Text Similarity for Arabic Language at Sentence Level

被引:0
|
作者
Alhawarat M.O. [1 ]
Abdeljaber H. [1 ]
Hilal A. [2 ]
机构
[1] Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Alkharj
[2] General Department, College of Preparatory Year, Prince Sattam Bin Abdulaziz University, Alkharj
关键词
Lemmatization; Machine learning; Natural language processing; Semantic text similarity; Stemming; TF-IDF; Word embedding;
D O I
10.7717/PEERJ-CS.530
中图分类号
学科分类号
摘要
Semantic Text Similarity (STS) has several and important applications in the field of Natural Language Processing (NLP). The Aim of this study is to investigate the effect of stemming on text similarity for Arabic language at sentence level. Several Arabic light and heavy stemmers as well as lemmatization algorithms are used in this study, with a total of 10 algorithms. Standard training and testing data sets are used from SemEval-2017 international workshop for Task 1, Track 1 Arabic (ar–ar). Different features are selected to study the effect of stemming on text similarity based on different similarity measures. Traditional machine learning algorithms are used such as Support Vector Machines (SVM), Stochastic Gradient Descent (SGD) and Naïve Bayesian (NB). Compared to the original text, using the stemmed and lemmatized documents in experiments achieve enhanced Pearson correlation results. The best results attained when using Arabic light Stemmer (ARLSTem) and Farasa light stemmers, Farasa and Qalsadi Lemmatizers and Tashaphyne heavy stemmer. The best enhancement was about 7.34% in Pearson correlation. In general, stemming considerably improves the performance of sentence text similarly for Arabic language. However, some stemmers make results worse than those for original text; they are Khoja heavy stemmer and AlKhalil light stemmer. Copyright 2021 Alhawarat et al.
引用
收藏
页码:1 / 18
页数:17
相关论文
共 50 条
  • [31] Stemming for Arabic Words Similarity Measures based on Latent Semantic Analysis Model
    Froud, Hanane
    Lachkar, Abdelmonaime
    Alaoui Ouatik, Said
    2012 INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS (ICMCS), 2012, : 780 - 784
  • [32] Sentence Similarity Measurement for Bengali Abstractive Text Summarization
    Masum, Abu Kaisar Mohammad
    Abujar, Sheikh
    Tusher, Raja Tariqul Hasan
    Faisal, Fahad
    Hossain, Syed Akhter
    2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
  • [33] Text-based English-Arabic sentence alignment
    Fattah, Mohamed Abdel
    Ren, Fuji
    Kuroiwa, Shingo
    COMPUTATIONAL INTELLIGENCE, PT 2, PROCEEDINGS, 2006, 4114 : 748 - 753
  • [34] Generation of the sense of a sentence in Arabic language with a connectionist approach
    Meftouh, K
    Laskri, MT
    ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, PROCEEDINGS, 2001, : 125 - 127
  • [35] Automatic translation of Arabic text-to-Arabic sign language
    Luqman, Hamzah
    Mahmoud, Sabri A.
    UNIVERSAL ACCESS IN THE INFORMATION SOCIETY, 2019, 18 (04) : 939 - 951
  • [36] Automatic translation of Arabic text-to-Arabic sign language
    Hamzah Luqman
    Sabri A. Mahmoud
    Universal Access in the Information Society, 2019, 18 : 939 - 951
  • [37] NGram Approach for Semantic Similarity on Arabic Short Text
    Al-Mahmoud, Rana Husni
    Sharieh, Ahmad
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (11) : 857 - 866
  • [38] A Text Semantic Similarity Approach for Arabic Paraphrase Detection
    Mahmoud, Adnen
    Zrigui, Ahmed
    Zrigui, Mounir
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2017, PT II, 2018, 10762 : 338 - 349
  • [39] A new sentence similarity measure and sentence based extractive technique for automatic text summarization
    Aliguliyev, Ramiz M.
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (04) : 7764 - 7772
  • [40] ATLASLang NMT: Arabic text language into Arabic sign language neural machine translation
    Brour, Mourad
    Benabbou, Abderrahim
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2021, 33 (09) : 1121 - 1131