Feature Selection by Using Heuristic Methods for Text Classification

被引:0
|
作者
Sel, Ilhami [1 ]
Yeroglu, Celalettin [1 ]
Hanbay, Davut [1 ]
机构
[1] Inonu Univ, Bilgisayar Muhendisligi Bolumu, Malatya, Turkey
关键词
Natural Language Processing; Doc2Vec; Whale Optimization; Grey Wolf Optimization; Chi-Square;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection can be defined as the selection of the best subset to represent the data set in machine learning applications, in other words extraction of the unnecessary data that has no effect on the result. In classification problems efficiency and accuracy of the system can be increased when the dimension is reduced by feature selection. In this study, text classifying application is performed by using the data set of "20 News Group" released in Reuters News Agent. The pre-processed news data were converted to vectors by using Doc2Vec method and the data set was created and classified by Naive Bayes method. Subsequently, a subset of the data set was formed by using heuristic methods that were inspired by nature (Whale and Gray Wolf Optimization Algorithms) and Chi-square method for feature selection. Then the reclassification was applied and the results were compared. While the success of the system with 600 features before the feature selection is 0.9214, the performance ratio of the 100 featured models created later is figured higher (0.94095 - 0.93833- 0.93619).
引用
收藏
页数:6
相关论文
共 50 条
  • [21] A comparative study of feature selection methods for binary text streams classification
    Matheus Bernardelli de Moraes
    Andre Leon Sampaio Gradvohl
    Evolving Systems, 2021, 12 : 997 - 1013
  • [22] A comparative study of feature selection methods for binary text streams classification
    de Moraes, Matheus Bernardelli
    Sampaio Gradvohl, Andre Leon
    EVOLVING SYSTEMS, 2021, 12 (04) : 997 - 1013
  • [23] Dynamic feature selection in text classification
    Doan, Son
    Horiguchi, Susumu
    INTELLIGENT CONTROL AND AUTOMATION, 2006, 344 : 664 - 675
  • [24] Contextual feature selection for text classification
    Paradis, Francois
    Nie, Jian-Yun
    INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (02) : 344 - 352
  • [25] Feature selection for text classification: A review
    Deng, Xuelian
    Li, Yuqing
    Weng, Jian
    Zhang, Jilian
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (03) : 3797 - 3816
  • [26] Hybrid feature selection for text classification
    Gunal, Serkan
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2012, 20 : 1296 - 1311
  • [27] Feature Selection Strategy in Text Classification
    Fung, Pui Cheong Gabriel
    Morstatter, Fred
    Liu, Huan
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6634 : 26 - 37
  • [28] Feature selection for text classification: A review
    Xuelian Deng
    Yuqing Li
    Jian Weng
    Jilian Zhang
    Multimedia Tools and Applications, 2019, 78 : 3797 - 3816
  • [29] Feature Selection for Ordinal Text Classification
    Baccianella, Stefano
    Esuli, Andrea
    Sebastiani, Fabrizio
    NEURAL COMPUTATION, 2014, 26 (03) : 557 - 591
  • [30] ARTC: feature selection using association rules for text classification
    Saeed, Mozamel M.
    Al Aghbari, Zaher
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (24): : 22519 - 22529