Automated Arabic Text Classification With P-Stemmer, Machine Learning, and a Tailored News Article Taxonomy

被引:27
|
作者
Kanan, Tarek [1 ]
Fox, Edward A. [2 ]
机构
[1] Al Zaytoonah Univ Jordan, Fac Sci & Informat Technol, Dept Software Engn, Amman, Jordan
[2] Virginia Polytech Inst & State Univ, Virginia Tech, Coll Engn, McBryde Hall Room 114 0106, Blacksburg, VA 24061 USA
关键词
digital libraries; information retrieval; natural language processing;
D O I
10.1002/asi.23609
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Arabic news articles in electronic collections are difficult to study. Browsing by category is rarely supported. Although helpful machine-learning methods have been applied successfully to similar situations for English news articles, limited research has been completed to yield suitable solutions for Arabic news. In connection with a Qatar National Research Fund (QNRF)-funded project to build digital library community and infrastructure in Qatar, we developed software for browsing a collection of about 237,000 Arabic news articles, which should be applicable to other Arabic news collections. We designed a simple taxonomy for Arabic news stories that is suitable for the needs of Qatar and other nations, is compatible with the subject codes of the International Press Telecommunications Council, and was enhanced with the aid of a librarian expert as well as five Arabic-speaking volunteers. We developed tailored stemming (i.e., a new Arabic light stemmer called P-Stemmer) and automatic classification methods (the best being binary Support Vector Machines classifiers) to work with the taxonomy. Using evaluation techniques commonly used in the information retrieval community, including 10-fold cross-validation and the Wilcoxon signed-rank test, we showed that our approach to stemming and classification is superior to state-of-the-art techniques.
引用
收藏
页码:2667 / 2683
页数:17
相关论文
共 26 条
  • [1] P-Stemmer or NLTK Stemmer for Arabic Text Classification?
    Elbes, Mohammed
    Aldajah, Amal
    Sadaqa, Odai
    2019 SIXTH INTERNATIONAL CONFERENCE ON SOCIAL NETWORKS ANALYSIS, MANAGEMENT AND SECURITY (SNAMS), 2019, : 516 - 520
  • [2] Improving Arabic Text Classification Using P-Stemmer
    Kanan T.
    Hawashin B.
    Alzubi S.
    Almaita E.
    Alkhatib A.
    Maria K.A.
    Elbes M.
    Recent Advances in Computer Science and Communications, 2022, 15 (03) : 404 - 411
  • [3] A Machine Learning Framework for Automated News Article Title Classification in Albanian
    Plaku, Evis
    Jahaj, Klei
    Cela, Arben
    Civici, Nikolla
    2024 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS, INISTA, 2024,
  • [4] Machine learning algorithms in Arabic Text Classification: A Review
    Aboalnaser, Sara A.
    12TH INTERNATIONAL CONFERENCE ON THE DEVELOPMENTS IN ESYSTEMS ENGINEERING (DESE 2019), 2019, : 290 - 295
  • [5] Scalable Arabic text Classification Using Machine Learning Model
    Al Mgheed, Rahaf M.
    2021 12TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2021, : 483 - 485
  • [6] Arabic News Classification Based on the Country of Origin Using Machine Learning and Deep Learning Techniques
    Zamzami, Nuha
    Himdi, Hanen
    Sabbeh, Sahar F.
    APPLIED SCIENCES-BASEL, 2023, 13 (12):
  • [7] Enhanced automated text categorization via Aquila optimizer with deep learning for Arabic news articles
    Alzaidi, Muhammad Swaileh A.
    Alshammari, Alya
    Hassan, Abdulkhaleq Q. A.
    Ebad, Shouki A.
    Al Sultan, Hanan
    Alliheedi, Mohammed A.
    Aljubailan, Ali Abdulaziz
    Alzahrani, Khadija Abdullah
    AIN SHAMS ENGINEERING JOURNAL, 2025, 16 (01)
  • [8] Automated Arabic Text Classification Using Hyperparameter Tuned Hybrid Deep Learning Model
    Al-onazi, Badriyya B.
    Alotaib, Saud S.
    Alshahrani, Saeed Masoud
    Alotaibi, Najm
    Alnfiai, Mrim M.
    Salama, Ahmed S.
    Hamza, Manar Ahmed
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (03): : 5447 - 5465
  • [9] An improved approach to Arabic news classification based on hyperparameter tuning of machine learning algorithms
    Jamaleddyn, Imad
    El Ayachi, Rachid
    Biniz, Mohamed
    JOURNAL OF ENGINEERING RESEARCH, 2023, 11 (02):
  • [10] Investigating the Impact of Preprocessing Techniques and Representation Models on Arabic Text Classification using Machine Learning
    Masadeh, Mahmoud
    Moustapha, A.
    Sharada, B.
    Hanumanthappa, J.
    Hemachandran, K.
    Chola, Channabasava
    Muaad, Abdullah Y.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (01) : 1115 - 1123