Optimizing text classification through efficient feature selection based on quality metric

被引:23
|
作者
Lamirel, Jean-Charles [1 ]
Cuxac, Pascal [2 ]
Chivukula, Aneesh Sreevallabh [3 ]
Hajlaoui, Kafil [3 ]
机构
[1] LORIA, INRIA Nancy Grand Est, SYNALP Team, Vandoeuvre Les Nancy, France
[2] INIST CNRS, Vandoeuvre Les Nancy, France
[3] Int Inst Informat Technol, Ctr Data Engn, Gachibowli Hyderabad, Andhra Pradesh, India
关键词
Feature maximization; Clustering quality index; Feature selection; Supervised learning; Unbalanced data; Text;
D O I
10.1007/s10844-014-0317-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature maximization is a cluster quality metric which favors clusters with maximum feature representation as regard to their associated data. In this paper we show that a simple adaptation of such metric can provide a highly efficient feature selection and feature contrasting model in the context of supervised classification. The method is experienced on different types of textual datasets. The paper illustrates that the proposed method provides a very significant performance increase, as compared to state of the art methods, in all the studied cases even when a single bag of words model is exploited for data description. Interestingly, the most significant performance gain is obtained in the case of the classification of highly unbalanced, highly multidimensional and noisy data, with a high degree of similarity between the classes.
引用
收藏
页码:379 / 396
页数:18
相关论文
共 50 条
  • [1] Optimizing text classification through efficient feature selection based on quality metric
    Jean-Charles Lamirel
    Pascal Cuxac
    Aneesh Sreevallabh Chivukula
    Kafil Hajlaoui
    Journal of Intelligent Information Systems, 2015, 45 : 379 - 396
  • [2] Efficient Method for Feature Selection in Text Classification
    Sun, Jian
    Zhang, Xiang
    Liao, Dan
    Chang, Victor
    2017 INTERNATIONAL CONFERENCE ON ENGINEERING AND TECHNOLOGY (ICET), 2017,
  • [3] Text Guide: Improving the Quality of Long Text Classification by a Text Selection Method Based on Feature Importance
    Fiok, Krzysztof
    Karwowski, Waldemar
    Gutierrez-Franco, Edgar
    Davahli, Mohammad Reza
    Wilamowski, Maciej
    Ahram, Tareq
    Al-Juaid, Awad
    Zurada, Jozef
    IEEE ACCESS, 2021, 9 (09): : 105439 - 105450
  • [4] Feature Selection in Text Classification
    Sahin, Durmus Ozkan
    Ates, Nurullah
    Kilic, Erdal
    2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1777 - 1780
  • [5] A New Performance Metric to Evaluate Filter Feature Selection Methods in Text Classification
    Cekik, Rasim
    Kaya, Mahmut
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2024, 30 (07) : 978 - 1005
  • [6] Utility-based feature selection for text classification
    Heyong Wang
    Ming Hong
    Raymond Yiu Keung Lau
    Knowledge and Information Systems, 2019, 61 : 197 - 226
  • [7] Utility-based feature selection for text classification
    Wang, Heyong
    Hong, Ming
    Lau, Raymond Yiu Keung
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 61 (01) : 197 - 226
  • [8] Text classification based on feature selection and LDA model
    Zheng, C. (csahu@126.com), 1600, Binary Information Press, P.O. Box 162, Bethel, CT 06801-0162, United States (09):
  • [9] An Efficient Statistical Feature Selection Based Classification
    Narayanamma, K. Laxmi
    Krishnaiah, R., V
    Sammulal, P.
    JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES, 2019, 14 (04): : 27 - 40
  • [10] Optimizing Feature Selection for Efficient Encrypted Traffic Classification: A Systematic Approach
    Shen, Meng
    Liu, Yiting
    Zhu, Liehuang
    Xu, Ke
    Du, Xiaojiang
    Guizani, Nadra
    IEEE NETWORK, 2020, 34 (04): : 20 - 27