An Arabic text categorization approach using term weighting and multiple reducts

被引:0
|
作者
Qasem A. Al-Radaideh
Mohammed A. Al-Abrat
机构
[1] Yarmouk University,Department of Computer Information Systems, Faculty of Information Technology and Computer Sciences
来源
Soft Computing | 2019年 / 23卷
关键词
Rough set theory; Arabic text categorization; Reducts extraction; Single reduct; Multiple reducts;
D O I
暂无
中图分类号
学科分类号
摘要
Text categorization is the process of assigning a predefined category label to an unlabeled document based on its content. One of the challenges of automatic text categorization is the high dimensionality of data that may affect the performance of the categorization model. This paper proposed an approach for the categorization of Arabic text based on term weighting and the reduct concept of the rough set theory to reduce the number of terms used to generate the classification rules that form the classifier. The paper proposed a multiple minimal reduct extraction algorithm by improving the Quick reduct algorithm. The multiple reducts are used to generate the set of classification rules which represent the rough set classifier. To evaluate the proposed approach, an Arabic corpus of 2700 documents nine categories is used. In the experiment, we compared the results of the proposed approach when using multiple and single minimal reducts. The results showed that the proposed approach had achieved an accuracy of 94% when using multiple reducts, which outperformed the single reduct method which achieved an accuracy of 86%. The results of the experiments also showed that the proposed approach outperforms both the K-NN and J48 algorithms regarding classification accuracy using the dataset on hand.
引用
收藏
页码:5849 / 5863
页数:14
相关论文
共 50 条
  • [1] An Arabic text categorization approach using term weighting and multiple reducts
    Al-Radaideh, Qasem A.
    Al-Abrat, Mohammed A.
    SOFT COMPUTING, 2019, 23 (14) : 5849 - 5863
  • [2] A term weighting approach for text categorization
    Lee, KC
    Kang, SS
    Hahn, KS
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2005, 3689 : 673 - 678
  • [3] A Redundancy Based Term Weighting Approach for Text Categorization
    Lu, Zhen-Yu
    Lin, Yong-Min
    Zhao, Shuang
    Chen, Jing-Nian
    Zhu, Wei-Dong
    2009 WRI WORLD CONGRESS ON SOFTWARE ENGINEERING, VOL 2, PROCEEDINGS, 2009, : 36 - +
  • [4] Imbalanced Text Categorization Based on Positive and Negative Term Weighting Approach
    Naderalvojoud, Behzad
    Sezer, Ebru Akcapinar
    Ucan, Alaettin
    TEXT, SPEECH, AND DIALOGUE (TSD 2015), 2015, 9302 : 325 - 333
  • [5] Two novel term weighting for text categorization
    Matsunaga, L. A.
    Ebecken, N. F. F.
    DATA MINING IX: DATA MINING, PROTECTION, DETECTION AND OTHER SECURITY TECHNOLOGIES, 2008, 40 : 105 - 114
  • [6] Supervised term weighting for automated text categorization
    Debole, F
    Sebastiani, F
    TEXT MINING AND ITS APPLICATIONS, 2004, 138 : 81 - 97
  • [7] A semantic term weighting scheme for text categorization
    Luo, Qiming
    Chen, Enhong
    Xiong, Hui
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (10) : 12708 - 12716
  • [8] Term Weighting using Contextual Information for Categorization of Unstructured Text Documents
    Kulkarni, Anagha
    Tokekar, Vrinda
    Kulkarni, Parag
    2015 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2015,
  • [9] Nonlinear transformation of term frequencies for term weighting in text categorization
    Erenel, Zafer
    Altincay, Hakan
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2012, 25 (07) : 1505 - 1514
  • [10] A Novel Term Weighting Scheme and an Approach for Classification of Agricultural Arabic Text Complaints
    Guru, D. S.
    Ali, Mostafa
    Suhil, Mahamad
    2018 IEEE 2ND INTERNATIONAL WORKSHOP ON ARABIC AND DERIVED SCRIPT ANALYSIS AND RECOGNITION (ASAR), 2018, : 24 - 28