A New Performance Metric to Evaluate Filter Feature Selection Methods in Text Classification

被引:1
|
作者
Cekik, Rasim [1 ]
Kaya, Mahmut [2 ]
机构
[1] Sirnak Univ, Fac Engn, Dept Comp Engn, Sirnak, Turkiye
[2] Firat Univ, Fac Engn, Dept Artificial Intelligence & Data Engn, Elazig, Turkiye
关键词
selection error; Text classification; feature selection; filtering methods; performance metric;
D O I
10.3897/jucs.111675
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
High dimensionality and sparsity are the primary issues in text classification. Using feature selection approaches, the most effective way to solve the problem is to select a subset of features. The most common and effective methods used for this process are filter techniques. Various performance metrics such as Micro-F1, Macro-F1, and Accuracy are used to evaluate the performance of filter methods used for feature selection on datasets Such methods work depending on a classification algorithm. However, when selecting features in filter techniques, the information on the individual features is evaluated without considering the relationship between the features. In such an approach, the actual performance of the filter technique used in feature selection may not be determined. In such a case, it causes the existing methods to be insufficient in testing the validity of the proposed method. For this purpose, this study suggests a novel performance metric called Selection Error (SE) to determine the actual performance evaluation of filter techniques. The Selection Error metric allows us to analyze the information value of the selected features more accurately than existing methods without relying on a classifier. The feature selection performance of the filtering approaches was performed on six different datasets with both The Selection Error and traditional performance metrics. When the results are examined, it is seen that there is a strong relationship between the proposed performance metric and the classification performance metric results. The Selection Error aims to significantly contribute to the literature by demonstrating the success of filtering feature selection methods, regardless of classifier performance.
引用
收藏
页码:978 / 1005
页数:28
相关论文
共 50 条
  • [1] Filter feature selection methods for text classification: a review
    Ming, Hong
    Heyong, Wang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (1) : 2053 - 2091
  • [2] Filter feature selection methods for text classification: a review
    Hong Ming
    Wang Heyong
    Multimedia Tools and Applications, 2024, 83 : 2053 - 2091
  • [3] A New Filter Feature Selection Method for Text Classification
    Cekik, Rasim
    IEEE ACCESS, 2024, 12 : 139316 - 139335
  • [4] Ensemble Filter-Wrapper Text Feature Selection Methods for Text Classification
    Ige, Oluwaseun Peter
    Gan, Keng Hoon
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2024, 141 (02): : 1847 - 1865
  • [5] Feature Selection Methods for Text Classification
    Dasgupta, Anirban
    Drineas, Petros
    Harb, Boulos
    Josifovski, Vanja
    Mahoney, Michael W.
    KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 230 - +
  • [6] Multivariate filter methods for feature selection with the γ-metric
    Ngo, Nicolas
    Michel, Pierre
    Giorgi, Roch
    BMC MEDICAL RESEARCH METHODOLOGY, 2024, 24 (01)
  • [7] Comparison on Feature Selection Methods for Text Classification
    Liu, Wenkai
    Xiao, Jiongen
    Hong, Ming
    2020 THE 4TH INTERNATIONAL CONFERENCE ON MANAGEMENT ENGINEERING, SOFTWARE ENGINEERING AND SERVICE SCIENCES (ICMSS 2020), 2020, : 82 - 86
  • [8] A new feature selection metric for text classification: eliminating the need for a separate pruning stage
    Muhammad Asim
    Kashif Javed
    Abdur Rehman
    Haroon A. Babri
    International Journal of Machine Learning and Cybernetics, 2021, 12 : 2461 - 2478
  • [9] A new feature selection metric for text classification: eliminating the need for a separate pruning stage
    Asim, Muhammad
    Javed, Kashif
    Rehman, Abdur
    Babri, Haroon A.
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (09) : 2461 - 2478
  • [10] A new metric for feature selection on short text datasets
    Cekik, Rasim
    Uysal, Alper Kursat
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (13):