Weighted Document Frequency for Feature Selection in Text Classification

被引:0
|
作者
Li, Baoli [1 ]
Yan, Qiuling [1 ]
Xu, Zhenqiang [1 ]
Wang, Guicai [1 ]
机构
[1] Henan Univ Technol, Coll Informat Sci & Engn, Zhengzhou, Peoples R China
关键词
Document Frequency; Weighted Document Frequency; Feature Selection; Text Classification; Text Categorization;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the past research, Document Frequency (DF) has been validated to be a simple yet quite effective measure for feature selection in text classification. The calculation is based on how many documents in a collection contain a feature, which can be a word, a phrase, a n-gram, or a specially derived attribute. The counting process takes a binary strategy: if a feature appears in a document, its DF will be increased by one. This traditional DF metric concerns only about whether a feature appears in a document, but does not consider how important the feature is in that document. Obviously, thus counted document frequency is very likely to introduce much noise. Therefore, a weighted document frequency (WDF) is proposed and expected to reduce such noise to some extent. Extensive experiments on two text classification data sets demonstrate the effectiveness of the proposed measure.
引用
收藏
页码:132 / 135
页数:4
相关论文
共 50 条
  • [31] A Bayesian feature selection paradigm for text classification
    Feng, Guozhong
    Guo, Jianhua
    Jing, Bing-Yi
    Hao, Lizhu
    INFORMATION PROCESSING & MANAGEMENT, 2012, 48 (02) : 283 - 302
  • [32] A new feature selection method for text classification
    Uchyigit, Gulden
    Clark, Keith
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2007, 21 (02) : 423 - 438
  • [33] Text feature selection method for hierarchical classification
    Zhu, Cui-Ling
    Ma, Jun
    Zhang, Dong-Mei
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2011, 24 (01): : 103 - 110
  • [34] Feature Selection Method of Text Tendency Classification
    Li, Yanling
    Dai, Guanzhong
    Li, Gang
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 34 - +
  • [35] An enhanced feature selection method for text classification
    Kang, Jinbeom
    Lee, Eunshil
    Hong, Kwanghee
    Park, Jeahyun
    Kim, Taehwan
    Park, Juyoung
    Choi, Joongmin
    Yang, Jaeyoung
    PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 36 - 41
  • [36] Effective feature selection technique for text classification
    Seetha, Hari
    Murty, M. Narasimha
    Saravanan, R.
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2015, 7 (03) : 165 - 184
  • [37] A feature selection and classification technique for text categorization
    Girgis, MR
    Aly, AA
    INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS, 2003, 12 (04) : 441 - 454
  • [38] Feature selection improves text classification accuracy
    不详
    IEEE INTELLIGENT SYSTEMS, 2005, 20 (06) : 75 - 75
  • [39] A new approach to feature selection in text classification
    Wang, Y
    Wang, XJ
    PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 3814 - 3819
  • [40] Higher order feature selection for text classification
    Jan Bakus
    Mohamed S. Kamel
    Knowledge and Information Systems, 2006, 9 : 468 - 491