A New Performance Metric to Evaluate Filter Feature Selection Methods in Text Classification

被引:1
|
作者
Cekik, Rasim [1 ]
Kaya, Mahmut [2 ]
机构
[1] Sirnak Univ, Fac Engn, Dept Comp Engn, Sirnak, Turkiye
[2] Firat Univ, Fac Engn, Dept Artificial Intelligence & Data Engn, Elazig, Turkiye
关键词
selection error; Text classification; feature selection; filtering methods; performance metric;
D O I
10.3897/jucs.111675
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
High dimensionality and sparsity are the primary issues in text classification. Using feature selection approaches, the most effective way to solve the problem is to select a subset of features. The most common and effective methods used for this process are filter techniques. Various performance metrics such as Micro-F1, Macro-F1, and Accuracy are used to evaluate the performance of filter methods used for feature selection on datasets Such methods work depending on a classification algorithm. However, when selecting features in filter techniques, the information on the individual features is evaluated without considering the relationship between the features. In such an approach, the actual performance of the filter technique used in feature selection may not be determined. In such a case, it causes the existing methods to be insufficient in testing the validity of the proposed method. For this purpose, this study suggests a novel performance metric called Selection Error (SE) to determine the actual performance evaluation of filter techniques. The Selection Error metric allows us to analyze the information value of the selected features more accurately than existing methods without relying on a classifier. The feature selection performance of the filtering approaches was performed on six different datasets with both The Selection Error and traditional performance metrics. When the results are examined, it is seen that there is a strong relationship between the proposed performance metric and the classification performance metric results. The Selection Error aims to significantly contribute to the literature by demonstrating the success of filtering feature selection methods, regardless of classifier performance.
引用
收藏
页码:978 / 1005
页数:28
相关论文
共 50 条
  • [21] Two new feature selection metrics for text classification
    Sahin, Durmus Ozkan
    Kilic, Erdal
    AUTOMATIKA, 2019, 60 (02) : 162 - 171
  • [22] A novel multivariate filter method for feature selection in text classification problems
    Labani, Mahdieh
    Moradi, Parham
    Ahmadizar, Fardin
    Jalili, Mahdi
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2018, 70 : 25 - 37
  • [23] Feature selection methods for text classification: a systematic literature review
    Pintas, Julliano Trindade
    Fernandes, Leandro A. F.
    Garcia, Ana Cristina Bicharra
    ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (08) : 6149 - 6200
  • [24] On Two-Stage Feature Selection Methods for Text Classification
    Uysal, Alper Kursat
    IEEE ACCESS, 2018, 6 : 43233 - 43251
  • [25] Feature selection methods for text classification: a systematic literature review
    Julliano Trindade Pintas
    Leandro A. F. Fernandes
    Ana Cristina Bicharra Garcia
    Artificial Intelligence Review, 2021, 54 : 6149 - 6200
  • [26] Arabic Text Classification: A Review Study on Feature Selection Methods
    Hijazi, Musab Mustafa
    Zeki, Akram
    Ismail, Amelia
    2021 22ND INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2021, : 554 - 559
  • [27] Comparing multiple categories of feature selection methods for text classification
    Zheng, Wanwan
    Jin, Mingzhe
    DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2020, 35 (01) : 208 - 224
  • [28] Feature Selection in Text Classification
    Sahin, Durmus Ozkan
    Ates, Nurullah
    Kilic, Erdal
    2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1777 - 1780
  • [29] Optimizing text classification through efficient feature selection based on quality metric
    Jean-Charles Lamirel
    Pascal Cuxac
    Aneesh Sreevallabh Chivukula
    Kafil Hajlaoui
    Journal of Intelligent Information Systems, 2015, 45 : 379 - 396
  • [30] Optimizing text classification through efficient feature selection based on quality metric
    Lamirel, Jean-Charles
    Cuxac, Pascal
    Chivukula, Aneesh Sreevallabh
    Hajlaoui, Kafil
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2015, 45 (03) : 379 - 396