A two-stage feature selection method for text categorization

被引:43
|
作者
Meng, Jiana [1 ,2 ]
Lin, Hongfei [1 ]
Yu, Yuhai [1 ,3 ]
机构
[1] Dalian Univ Technol, Dept Comp Sci & Engn, Dalian 116024, Peoples R China
[2] Dalian Nationalities Univ, Coll Sci, Dalian 116600, Peoples R China
[3] Dalian Nationalities Univ, Sch Comp Sci & Engn, Dalian 116600, Peoples R China
关键词
Feature selection; Text categorization; Latent semantic indexing; Support vector machine;
D O I
10.1016/j.camwa.2011.07.045
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Feature selection for text categorization is a well-studied problem and its goal is to improve the effectiveness of categorization, or the efficiency of computation, or both. The system of text categorization based on traditional term-matching is used to represent the vector space model as a document; however, it needs a high dimensional space to represent the document, and does not take into account the semantic relationship between terms, which leads to a poor categorization accuracy. The latent semantic indexing method can overcome this problem by using statistically derived conceptual indices to replace the individual terms. With the purpose of improving the accuracy and efficiency of categorization, in this paper we propose a two-stage feature selection method. Firstly, we apply a novel feature selection method to reduce the dimension of terms; and then we construct a new semantic space, between terms, based on the latent semantic indexing method. Through some applications involving the spam database categorization, we find that our two-stage feature selection method performs better. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:2793 / 2800
页数:8
相关论文
共 50 条
  • [1] Two-stage Feature Selection Method for Text Classification
    Li Xi
    Dai Hang
    Wang Mingwen
    MINES 2009: FIRST INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION NETWORKING AND SECURITY, VOL 1, PROCEEDINGS, 2009, : 234 - +
  • [2] Two-Stage Feature Selection for Text Classification
    Ozgur, Levent
    Gungor, Tunga
    INFORMATION SCIENCES AND SYSTEMS 2015, 2016, 363 : 329 - 337
  • [3] A Two-Stage Feature Selection Method for Text Categorization by Using Category Correlation Degree and Latent Semantic Indexing
    王飞
    李彩虹
    王景山
    徐娇
    李廉
    JournalofShanghaiJiaotongUniversity(Science), 2015, 20 (01) : 44 - 50
  • [4] A two-stage feature selection method for text categorization by using category correlation degree and latent semantic indexing
    Wang F.
    Li C.-H.
    Wang J.-S.
    Xu J.
    Li L.
    J. Shanghai Jiaotong Univ. Sci., 1 (44-50): : 44 - 50
  • [5] A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm
    Uguz, Harun
    KNOWLEDGE-BASED SYSTEMS, 2011, 24 (07) : 1024 - 1032
  • [6] On Two-Stage Feature Selection Methods for Text Classification
    Uysal, Alper Kursat
    IEEE ACCESS, 2018, 6 : 43233 - 43251
  • [7] A Two-stage Text Feature Selection Algorithm for Improving Text Classification
    Ashokkumar, P.
    Shankar, Siva G.
    Srivastava, Gautam
    Maddikunta, Praveen Kumar Reddy
    Gadekallu, Thippa Reddy
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (03)
  • [8] A hybrid feature selection method for text categorization
    Montanes, E.
    Quevedo, J. R.
    Combarro, E. F.
    Diaz, I.
    Ranilla, J.
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2007, 15 (02) : 133 - 151
  • [9] An Effective Feature Selection Method for Text Categorization
    Qiu, Xipeng
    Zhou, Jinlong
    Huang, Xuanjing
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6634 : 50 - 61
  • [10] A two-stage feature selection method with its application
    Zhao, Xuehua
    Li, Daoliang
    Yang, Bo
    Chen, Huiling
    Yang, Xinbin
    Yu, Chenglong
    Liu, Shuangyin
    COMPUTERS & ELECTRICAL ENGINEERING, 2015, 47 : 114 - 125