Category Discrimination Based Feature Selection Algorithm in Chinese Text Classification

被引:1
|
作者
Yi, Junkai [1 ]
Yang, Guang [1 ]
Wan, Jing [1 ]
机构
[1] Beijing Univ Chem Technol, Coll Informat Sci & Technol, Beijing 100029, Peoples R China
关键词
text classification; text categorization; feature selection; tj-idf; category discrimination; CATEGORIZATION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
How to improve the classification precision is a major issue in the field of Chinese text classification. The tf-idf algorithm is a classic and widely-used feature selection algorithm based on VSM. But the traditional tf-idf algorithm neglects the feature term's distribution inside category and among categories, which causes many unreasonable selective results. This paper makes an improvement to the traditional tf-idf algorithm through the introduction of the concept of Category Discrimination. We evaluate our algorithm with experiments, and make comparisons with other algorithms. The experimental results show that the improved tf-idf algorithm consistently has a higher precision and recall compared with the traditional tf-idf algorithm, and is superior to other algorithm as a whole. Therefore, it is a more effective feature selection algorithm in text classification field.
引用
收藏
页码:1145 / 1159
页数:15
相关论文
共 50 条
  • [21] Dynamic Feature Selection Strategy in Incremental Chinese Text Classification
    Yang, Dan
    Fan, Xinghua
    2012 2ND INTERNATIONAL CONFERENCE ON APPLIED ROBOTICS FOR THE POWER INDUSTRY (CARPI), 2012, : 1123 - 1126
  • [22] Research on feature selection method in Chinese text automatic classification
    Hong, Ying
    Geng, Zengmin
    ENERGY SCIENCE AND APPLIED TECHNOLOGY, 2016, : 359 - 361
  • [23] A Text Classification Algorithm based on Feature Weighting
    Yang, Han
    Cui, Honggang
    Tang, Hao
    GREEN ENERGY AND SUSTAINABLE DEVELOPMENT I, 2017, 1864
  • [24] Research on the Feature Selection Algorithm of Chinese News Classification
    Gong, Jun-peng
    Wen, Yu-jun
    Song, Qing
    INTERNATIONAL CONFERENCE ON SIMULATION, MODELLING AND MATHEMATICAL STATISTICS (SMMS 2015), 2015, : 455 - 458
  • [25] A NEW FEATURE SELECTION METHOD BASED ON CONCEPT EXTRACTION IN AUTOMATIC CHINESE TEXT CLASSIFICATION
    Liao, Shasha
    Jiang, Minghu
    NEW MATHEMATICS AND NATURAL COMPUTATION, 2007, 3 (03) : 331 - 347
  • [26] A Two-stage Text Feature Selection Algorithm for Improving Text Classification
    Ashokkumar, P.
    Shankar, Siva G.
    Srivastava, Gautam
    Maddikunta, Praveen Kumar Reddy
    Gadekallu, Thippa Reddy
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (03)
  • [27] Feature Selection in Text Classification
    Sahin, Durmus Ozkan
    Ates, Nurullah
    Kilic, Erdal
    2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1777 - 1780
  • [28] A two-stage Markov blanket based feature selection algorithm for text classification
    Javed, Kashif
    Maruf, Sameen
    Babri, Haroon A.
    NEUROCOMPUTING, 2015, 157 : 91 - 104
  • [29] Classification Algorithm Based on Feature Selection and Samples Selection
    Xu, Yitian
    Zhen, Ling
    Yang, Liming
    Wang, Laisheng
    ADVANCES IN NEURAL NETWORKS - ISNN 2009, PT 2, PROCEEDINGS, 2009, 5552 : 631 - 638
  • [30] Utility-based feature selection for text classification
    Heyong Wang
    Ming Hong
    Raymond Yiu Keung Lau
    Knowledge and Information Systems, 2019, 61 : 197 - 226