An Arabic text categorization approach using term weighting and multiple reducts

被引:0
|
作者
Qasem A. Al-Radaideh
Mohammed A. Al-Abrat
机构
[1] Yarmouk University,Department of Computer Information Systems, Faculty of Information Technology and Computer Sciences
来源
Soft Computing | 2019年 / 23卷
关键词
Rough set theory; Arabic text categorization; Reducts extraction; Single reduct; Multiple reducts;
D O I
暂无
中图分类号
学科分类号
摘要
Text categorization is the process of assigning a predefined category label to an unlabeled document based on its content. One of the challenges of automatic text categorization is the high dimensionality of data that may affect the performance of the categorization model. This paper proposed an approach for the categorization of Arabic text based on term weighting and the reduct concept of the rough set theory to reduce the number of terms used to generate the classification rules that form the classifier. The paper proposed a multiple minimal reduct extraction algorithm by improving the Quick reduct algorithm. The multiple reducts are used to generate the set of classification rules which represent the rough set classifier. To evaluate the proposed approach, an Arabic corpus of 2700 documents nine categories is used. In the experiment, we compared the results of the proposed approach when using multiple and single minimal reducts. The results showed that the proposed approach had achieved an accuracy of 94% when using multiple reducts, which outperformed the single reduct method which achieved an accuracy of 86%. The results of the experiments also showed that the proposed approach outperforms both the K-NN and J48 algorithms regarding classification accuracy using the dataset on hand.
引用
收藏
页码:5849 / 5863
页数:14
相关论文
共 50 条
  • [11] A NOVEL TERM WEIGHTING SCHEME MIDF FOR TEXT CATEGORIZATION
    Deisy, C.
    Gowri, M.
    Baskar, S.
    Kalaiarasi, S. M. A.
    Ramraj, N.
    JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2010, 5 (01) : 94 - 107
  • [12] A comparative study on term weighting schemes for text categorization
    Lan, M
    Sung, SY
    Low, HB
    Tan, CL
    PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 546 - 551
  • [13] Utilizing Language Model for Term Weighting in Text Categorization
    Coban, Onder
    Ozel, Selma Ayse
    2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP), 2018,
  • [14] Graph-Based Term Weighting for Text Categorization
    Malliaros, Fragkiskos D.
    Skianis, Konstantinos
    PROCEEDINGS OF THE 2015 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2015), 2015, : 1473 - 1479
  • [15] Analytical evaluation of term weighting schemes for text categorization
    Altincay, Hakan
    Erenel, Zafer
    PATTERN RECOGNITION LETTERS, 2010, 31 (11) : 1310 - 1323
  • [16] A New Improved Term Weighting Scheme for Text Categorization
    Nguyen Pham Xuan
    Hieu Le Quang
    KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2013), VOL 1, 2014, 244 : 261 - 270
  • [17] A novel term weighting scheme for automated text categorization
    Xu, Hongzhi
    Li, Chunping
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2007, : 759 - 764
  • [18] Explicit Use of Term Occurrence Probabilities for Term Weighting in Text Categorization
    Erenel, Zafer
    Altincay, Hakan
    Varoglu, Ekrem
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2011, 27 (03) : 819 - 834
  • [19] Supervised and Traditional Term Weighting Methods for Automatic Text Categorization
    Lan, Man
    Tan, Chew Lim
    Su, Jian
    Lu, Yue
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, 31 (04) : 721 - 735
  • [20] On entropy-based term weighting schemes for text categorization
    Wang, Tao
    Cai, Yi
    Leung, Ho-fung
    Lau, Raymond Y. K.
    Xie, Haoran
    Li, Qing
    KNOWLEDGE AND INFORMATION SYSTEMS, 2021, 63 (09) : 2313 - 2346