Feature Selection by Using Heuristic Methods for Text Classification

被引:0
|
作者
Sel, Ilhami [1 ]
Yeroglu, Celalettin [1 ]
Hanbay, Davut [1 ]
机构
[1] Inonu Univ, Bilgisayar Muhendisligi Bolumu, Malatya, Turkey
关键词
Natural Language Processing; Doc2Vec; Whale Optimization; Grey Wolf Optimization; Chi-Square;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection can be defined as the selection of the best subset to represent the data set in machine learning applications, in other words extraction of the unnecessary data that has no effect on the result. In classification problems efficiency and accuracy of the system can be increased when the dimension is reduced by feature selection. In this study, text classifying application is performed by using the data set of "20 News Group" released in Reuters News Agent. The pre-processed news data were converted to vectors by using Doc2Vec method and the data set was created and classified by Naive Bayes method. Subsequently, a subset of the data set was formed by using heuristic methods that were inspired by nature (Whale and Gray Wolf Optimization Algorithms) and Chi-square method for feature selection. Then the reclassification was applied and the results were compared. While the success of the system with 600 features before the feature selection is 0.9214, the performance ratio of the 100 featured models created later is figured higher (0.94095 - 0.93833- 0.93619).
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Feature reduction methods for text classification
    Wu, Di
    Zhang, Yaping
    Wang, Xin
    Journal of Computational Information Systems, 2008, 4 (02): : 495 - 502
  • [42] Wrapper and Hybrid Feature Selection Methods Using Metaheuristic Algorithms for English Text Classification: A Systematic Review
    Alyasiri, Osamah Mohammed
    Cheah, Yu-N
    Abasi, Ammar Kamal
    Al-Janabi, Omar Mustafa
    IEEE ACCESS, 2022, 10 : 39833 - 39852
  • [43] A New Performance Metric to Evaluate Filter Feature Selection Methods in Text Classification
    Cekik, Rasim
    Kaya, Mahmut
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2024, 30 (07) : 978 - 1005
  • [44] Data-driven Feature Selection Methods for Text Classification: an Empirical Evaluation
    Fragoso, Rogerio C. P.
    Pinheiro, Roberto H. W.
    Cavalcanti, George D. C.
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2019, 25 (04) : 334 - 360
  • [45] Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods
    Kou, Gang
    Yang, Pei
    Peng, Yi
    Xiao, Feng
    Chen, Yang
    Alsaadi, Fawaz E.
    APPLIED SOFT COMPUTING, 2020, 86
  • [46] A Heuristic Feature Selection Approach for Text Categorization by Using Chaos Optimization and Genetic Algorithm
    Chen, Hao
    Jiang, Wen
    Li, Canbing
    Li, Rui
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2013, 2013
  • [47] Feature Selection for Text Classification using OR plus SVM-RFE
    Luo, Meixiang
    Luo, Linkai
    2010 CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-5, 2010, : 1648 - 1652
  • [48] Heuristic Feature Selection for Shaving Tool Wear Classification
    Wang, Yong
    Brzezinski, Adam J.
    Qiao, Xianli
    Ni, Jun
    JOURNAL OF MANUFACTURING SCIENCE AND ENGINEERING-TRANSACTIONS OF THE ASME, 2017, 139 (04):
  • [49] Efficient Method for Feature Selection in Text Classification
    Sun, Jian
    Zhang, Xiang
    Liao, Dan
    Chang, Victor
    2017 INTERNATIONAL CONFERENCE ON ENGINEERING AND TECHNOLOGY (ICET), 2017,
  • [50] A new feature selection method for text classification
    Uchyigit, Gulden
    Clark, Keith
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2007, 21 (02) : 423 - 438