Category Discrimination Based Feature Selection Algorithm in Chinese Text Classification

被引:1
|
作者
Yi, Junkai [1 ]
Yang, Guang [1 ]
Wan, Jing [1 ]
机构
[1] Beijing Univ Chem Technol, Coll Informat Sci & Technol, Beijing 100029, Peoples R China
关键词
text classification; text categorization; feature selection; tj-idf; category discrimination; CATEGORIZATION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
How to improve the classification precision is a major issue in the field of Chinese text classification. The tf-idf algorithm is a classic and widely-used feature selection algorithm based on VSM. But the traditional tf-idf algorithm neglects the feature term's distribution inside category and among categories, which causes many unreasonable selective results. This paper makes an improvement to the traditional tf-idf algorithm through the introduction of the concept of Category Discrimination. We evaluate our algorithm with experiments, and make comparisons with other algorithms. The experimental results show that the improved tf-idf algorithm consistently has a higher precision and recall compared with the traditional tf-idf algorithm, and is superior to other algorithm as a whole. Therefore, it is a more effective feature selection algorithm in text classification field.
引用
收藏
页码:1145 / 1159
页数:15
相关论文
共 50 条
  • [41] Chinese Text Classification with Feature Fusion
    Wang Y.
    Wang H.
    Yu B.
    Data Analysis and Knowledge Discovery, 2021, 5 (10) : 1 - 14
  • [42] Feature Selection Strategy in Text Classification
    Fung, Pui Cheong Gabriel
    Morstatter, Fred
    Liu, Huan
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6634 : 26 - 37
  • [43] Feature selection for text classification: A review
    Xuelian Deng
    Yuqing Li
    Jian Weng
    Jilian Zhang
    Multimedia Tools and Applications, 2019, 78 : 3797 - 3816
  • [44] Feature Selection for Ordinal Text Classification
    Baccianella, Stefano
    Esuli, Andrea
    Sebastiani, Fabrizio
    NEURAL COMPUTATION, 2014, 26 (03) : 557 - 591
  • [45] Feature Selection Methods for Text Classification
    Dasgupta, Anirban
    Drineas, Petros
    Harb, Boulos
    Josifovski, Vanja
    Mahoney, Michael W.
    KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 230 - +
  • [46] A feature selection approach to estimate discrimination capability of feature subset category
    Song, Enmin
    Huang, Dongshan
    Ma, Guangzhi
    Xiao, Qiang
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2011, 39 (02): : 1 - 5
  • [47] Feature selection based on absolute deviation factor for text classification
    Jin, Lingbin
    Zhang, Li
    Zhao, Lei
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
  • [48] Cluster Based Symbolic Representation and Feature Selection for Text Classification
    Harish, B. S.
    Guru, D. S.
    Manjunath, S.
    Dinesh, R.
    ADVANCED DATA MINING AND APPLICATIONS (ADMA 2010), PT II, 2010, 6441 : 158 - 166
  • [49] A Variance-mean Based Feature Selection in Text Classification
    Yin, Shen
    Jiang, Zongli
    PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE, VOL III, 2009, : 519 - 522
  • [50] Feature Selection for Text Classification Based on Gini Coefficient of Inequality
    Singh, Sanasam Ranbir
    Murthy, Hema A.
    Gonsalves, Timothy A.
    PROCEEDINGS OF THE FOURTH INTERNATIONAL WORKSHOP ON FEATURE SELECTION IN DATA MINING, 2010, 10 : 76 - 85