Improved Mutual Information Method For Text Feature Selection

被引:0
|
作者
Ding Xiaoming [1 ]
Tang Yan [1 ]
机构
[1] Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China
关键词
text classification; feature selection; mutual information;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Reducing the dimensions of high-dimensional feature set is one of the difficulties of text categorization. Feature selection has been effectively applied in text classification, because of its low complexity of computing. Research works show that mutual information is a good feature selection method but doesn't consider the term frequency in each category of the corpus and the connections between terms. To remedying the defects of traditional mutual information method, this article improved measure of mutual information by introducing the feature frequency in class and the dispersion of feature in class, and built a experimental platform by constructing a Chinese text classification system, and did a multi-set of experiments base on this system. The results show that the new feature selection approach has a more excellent effect in text categorization.
引用
收藏
页码:163 / 166
页数:4
相关论文
共 50 条
  • [31] Normalized Mutual Information Feature Selection
    Estevez, Pablo. A.
    Tesmer, Michel
    Perez, Claudio A.
    Zurada, Jacek A.
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2009, 20 (02): : 189 - 201
  • [32] Mutual Information Criteria for Feature Selection
    Zhang, Zhihong
    Hancock, Edwin R.
    SIMILARITY-BASED PATTERN RECOGNITION, 2011, 7005 : 235 - 249
  • [33] On Estimating Mutual Information for Feature Selection
    Schaffernicht, Erik
    Kaltenhaeuser, Robert
    Verma, Saurabh Shekhar
    Gross, Horst-Michael
    ARTIFICIAL NEURAL NETWORKS-ICANN 2010, PT I, 2010, 6352 : 362 - +
  • [34] Feature selection with dynamic mutual information
    Liu, Huawen
    Sun, Jigui
    Liu, Lei
    Zhang, Huijie
    PATTERN RECOGNITION, 2009, 42 (07) : 1330 - 1339
  • [35] An Improved Information Gain Feature Selection Algorithm for SVM Text Classifier
    Xu, Jiamin
    Jiang, Hong
    2015 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY, 2015, : 273 - 276
  • [36] Improved Information Gain-based Feature Selection for Text Categorization
    Gao, Zhe
    Xu, Yajing
    Meng, Fanyu
    Qi, Feng
    Lin, Zhiqing
    2014 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, VEHICULAR TECHNOLOGY, INFORMATION THEORY AND AEROSPACE & ELECTRONIC SYSTEMS (VITAE), 2014,
  • [37] An Improved Feature Selection Method Based on Information Gain
    Li, Yanling
    Sun, Wenxia
    INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING BIOMEDICAL ENGINEERING, AND INFORMATICS (SPBEI 2013), 2014, : 530 - 535
  • [38] IMPROVED FEATURE SELECTION BASED ON A MUTUAL INFORMATION MEASURE FOR HYPERSPECTRAL IMAGE CLASSIFICATION
    Hossain, Md. Ali
    Jia, Xiuping
    Pickering, Mark
    2012 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2012, : 3058 - 3061
  • [39] Improved Relief Weight Feature Selection Algorithm Based on Relief and Mutual Information
    Wang, Hongbin
    Wang, Pengming
    Deng, Shengchun
    Li, Haoran
    INFORMATION, 2021, 12 (06)
  • [40] Feature selection and threshold method based on fuzzy joint mutual information
    Salem, Omar A.M.
    Liu, Feng
    Chen, Yi-Ping Phoebe
    Chen, Xi
    Liu, Feng (fliuwhu@whu.edu.cn), 1600, Elsevier Inc. (132): : 107 - 126