Improved Mutual Information Method For Text Feature Selection

被引:0
|
作者
Ding Xiaoming [1 ]
Tang Yan [1 ]
机构
[1] Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China
关键词
text classification; feature selection; mutual information;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Reducing the dimensions of high-dimensional feature set is one of the difficulties of text categorization. Feature selection has been effectively applied in text classification, because of its low complexity of computing. Research works show that mutual information is a good feature selection method but doesn't consider the term frequency in each category of the corpus and the connections between terms. To remedying the defects of traditional mutual information method, this article improved measure of mutual information by introducing the feature frequency in class and the dispersion of feature in class, and built a experimental platform by constructing a Chinese text classification system, and did a multi-set of experiments base on this system. The results show that the new feature selection approach has a more excellent effect in text categorization.
引用
收藏
页码:163 / 166
页数:4
相关论文
共 50 条
  • [1] Feature selection using improved mutual information for text classification
    Novovicová, J
    Malík, A
    Pudil, P
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, PROCEEDINGS, 2004, 3138 : 1010 - 1017
  • [2] Feature selection algorithm for text classification based on improved mutual information
    丛帅
    张积宾
    徐志明
    王宇颖
    Journal of Harbin Institute of Technology(New series), 2011, (03) : 144 - 148
  • [3] Discriminant Mutual Information for Text Feature Selection
    Wang, Jiaqi
    Zhang, Li
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT II, 2021, 12682 : 136 - 151
  • [4] Feature Selection Method Based on the Improved of Mutual Information and Genetic Algorithm
    Qiu Ye
    Liu Peiyu
    Yang Yuzhen
    2009 IEEE INTERNATIONAL SYMPOSIUM ON IT IN MEDICINE & EDUCATION, VOLS 1 AND 2, PROCEEDINGS, 2009, : 836 - 839
  • [5] Feature Selection for Text Classification Using Mutual Information
    Sel, Ilhami
    Karci, Ali
    Hanbay, Davut
    2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,
  • [6] An Improved Feature Selection for Categorization Based on Mutual Information
    Liu, Haifeng
    Su, Zhan
    Yao, Zeqing
    Liu, Shousheng
    WEB INFORMATION SYSTEMS AND MINING, PROCEEDINGS, 2009, 5854 : 80 - 87
  • [7] Improved Feature Selection Based On Normalized Mutual Information
    Li Yin
    Ma Xingfei
    Yang Mengxi
    Zhao Wei
    Gu Wenqiang
    14TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS FOR BUSINESS, ENGINEERING AND SCIENCE (DCABES 2015), 2015, : 518 - 522
  • [8] Mutual Information Using Sample Variance for Text Feature Selection
    Agnihotri, Deepak
    Verma, Kesari
    Tripathi, Priyanka
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON COMMUNICATION AND INFORMATION PROCESSING (ICCIP 2017), 2017, : 39 - 44
  • [9] Study of E-mail Filtering Based on Mutual Information Text Feature Selection Method
    Gong, Shangfu
    Gong, Xingyu
    Wang, Yuan
    INSTRUMENTATION, MEASUREMENT, CIRCUITS AND SYSTEMS, 2012, 127 : 33 - 39
  • [10] An improved Fuzzy Mutual Information Feature Selection for Classification Systems
    Wang, Liwei
    Salem, Omar A. M.
    2017 16TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS 2017), 2017, : 119 - 124