Optimizing text classification through efficient feature selection based on quality metric

被引:23
|
作者
Lamirel, Jean-Charles [1 ]
Cuxac, Pascal [2 ]
Chivukula, Aneesh Sreevallabh [3 ]
Hajlaoui, Kafil [3 ]
机构
[1] LORIA, INRIA Nancy Grand Est, SYNALP Team, Vandoeuvre Les Nancy, France
[2] INIST CNRS, Vandoeuvre Les Nancy, France
[3] Int Inst Informat Technol, Ctr Data Engn, Gachibowli Hyderabad, Andhra Pradesh, India
关键词
Feature maximization; Clustering quality index; Feature selection; Supervised learning; Unbalanced data; Text;
D O I
10.1007/s10844-014-0317-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature maximization is a cluster quality metric which favors clusters with maximum feature representation as regard to their associated data. In this paper we show that a simple adaptation of such metric can provide a highly efficient feature selection and feature contrasting model in the context of supervised classification. The method is experienced on different types of textual datasets. The paper illustrates that the proposed method provides a very significant performance increase, as compared to state of the art methods, in all the studied cases even when a single bag of words model is exploited for data description. Interestingly, the most significant performance gain is obtained in the case of the classification of highly unbalanced, highly multidimensional and noisy data, with a high degree of similarity between the classes.
引用
收藏
页码:379 / 396
页数:18
相关论文
共 50 条
  • [21] Feature selection for text classification: A review
    Xuelian Deng
    Yuqing Li
    Jian Weng
    Jilian Zhang
    Multimedia Tools and Applications, 2019, 78 : 3797 - 3816
  • [22] Feature Selection for Ordinal Text Classification
    Baccianella, Stefano
    Esuli, Andrea
    Sebastiani, Fabrizio
    NEURAL COMPUTATION, 2014, 26 (03) : 557 - 591
  • [23] Feature Selection Methods for Text Classification
    Dasgupta, Anirban
    Drineas, Petros
    Harb, Boulos
    Josifovski, Vanja
    Mahoney, Michael W.
    KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 230 - +
  • [24] Improving Short Text Classification through Better Feature Space Selection
    Wang, Meng
    Lin, Lanfen
    Wang, Feng
    2013 9TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2013, : 120 - 124
  • [25] A high-quality feature selection method based on frequent and correlated items for text classification
    Heba Mamdouh Farghaly
    Tarek Abd El-Hafeez
    Soft Computing, 2023, 27 : 11259 - 11274
  • [26] A high-quality feature selection method based on frequent and correlated items for text classification
    Farghaly, Heba Mamdouh
    Abd El-Hafeez, Tarek
    SOFT COMPUTING, 2023, 27 (16) : 11259 - 11274
  • [27] Feature selection based on absolute deviation factor for text classification
    Jin, Lingbin
    Zhang, Li
    Zhao, Lei
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
  • [28] Cluster Based Symbolic Representation and Feature Selection for Text Classification
    Harish, B. S.
    Guru, D. S.
    Manjunath, S.
    Dinesh, R.
    ADVANCED DATA MINING AND APPLICATIONS (ADMA 2010), PT II, 2010, 6441 : 158 - 166
  • [29] Firefly Algorithm based Feature Selection for Arabic Text Classification
    Marie-Sainte, Souad Larabi
    Alalyani, Nada
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2020, 32 (03) : 320 - 328
  • [30] Text Classification Based on Naive Bayes Algorithm with Feature Selection
    Chen, Zhenguo
    Shi, Guang
    Wang, Xiaoju
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2012, 15 (10): : 4255 - 4260