A two-stage feature selection method for text categorization

被引:43
|
作者
Meng, Jiana [1 ,2 ]
Lin, Hongfei [1 ]
Yu, Yuhai [1 ,3 ]
机构
[1] Dalian Univ Technol, Dept Comp Sci & Engn, Dalian 116024, Peoples R China
[2] Dalian Nationalities Univ, Coll Sci, Dalian 116600, Peoples R China
[3] Dalian Nationalities Univ, Sch Comp Sci & Engn, Dalian 116600, Peoples R China
关键词
Feature selection; Text categorization; Latent semantic indexing; Support vector machine;
D O I
10.1016/j.camwa.2011.07.045
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Feature selection for text categorization is a well-studied problem and its goal is to improve the effectiveness of categorization, or the efficiency of computation, or both. The system of text categorization based on traditional term-matching is used to represent the vector space model as a document; however, it needs a high dimensional space to represent the document, and does not take into account the semantic relationship between terms, which leads to a poor categorization accuracy. The latent semantic indexing method can overcome this problem by using statistically derived conceptual indices to replace the individual terms. With the purpose of improving the accuracy and efficiency of categorization, in this paper we propose a two-stage feature selection method. Firstly, we apply a novel feature selection method to reduce the dimension of terms; and then we construct a new semantic space, between terms, based on the latent semantic indexing method. Through some applications involving the spam database categorization, we find that our two-stage feature selection method performs better. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:2793 / 2800
页数:8
相关论文
共 50 条
  • [41] Cascaded feature selection in SVMs text categorization
    Masuyama, T
    Nakagawa, H
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PROCEEDINGS, 2003, 2588 : 588 - 591
  • [42] Study on constraints for feature selection in text categorization
    Xu, Yan
    Li, Jintao
    Wang, Bin
    Sun, Chunming
    Zhang, Sen
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2008, 45 (04): : 596 - 602
  • [43] A General Framework of Feature Selection for Text Categorization
    Jing, Hongfang
    Wang, Bin
    Yang, Yahui
    Xu, Yan
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, 2009, 5632 : 647 - +
  • [44] Two-Stage Method for Clothing Feature Detection
    Lyu, Xinwei
    Li, Xinjia
    Zhang, Yuexin
    Lu, Wenlian
    BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (04)
  • [45] A feature selection and classification technique for text categorization
    Girgis, MR
    Aly, AA
    INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS, 2003, 12 (04) : 441 - 454
  • [46] Text Categorization Based on Clustering Feature Selection
    Zhou, Xiaofei
    Hu, Yue
    Guo, Li
    2ND INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT, ITQM 2014, 2014, 31 : 398 - 405
  • [47] An examination of feature selection frameworks in text categorization
    How, BC
    Kiong, WT
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2005, 3689 : 558 - 564
  • [48] Feature selection based on feature interactions with application to text categorization
    Tang, Xiaochuan
    Dai, Yuanshun
    Xiang, Yanping
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 120 : 207 - 216
  • [49] A novel two-stage wrapper feature selection approach based on greedy search for text sentiment classification
    Sagbas, Ensar Arif
    NEUROCOMPUTING, 2024, 590
  • [50] Two-Stage Botnet Detection Method Based on Feature Selection for Industrial Internet of Things
    Shu, Jian
    Lu, Jiazhong
    IET INFORMATION SECURITY, 2025, 2025 (01)