Weighted Document Frequency for Feature Selection in Text Classification

被引:0
|
作者
Li, Baoli [1 ]
Yan, Qiuling [1 ]
Xu, Zhenqiang [1 ]
Wang, Guicai [1 ]
机构
[1] Henan Univ Technol, Coll Informat Sci & Engn, Zhengzhou, Peoples R China
关键词
Document Frequency; Weighted Document Frequency; Feature Selection; Text Classification; Text Categorization;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the past research, Document Frequency (DF) has been validated to be a simple yet quite effective measure for feature selection in text classification. The calculation is based on how many documents in a collection contain a feature, which can be a word, a phrase, a n-gram, or a specially derived attribute. The counting process takes a binary strategy: if a feature appears in a document, its DF will be increased by one. This traditional DF metric concerns only about whether a feature appears in a document, but does not consider how important the feature is in that document. Obviously, thus counted document frequency is very likely to introduce much noise. Therefore, a weighted document frequency (WDF) is proposed and expected to reduce such noise to some extent. Extensive experiments on two text classification data sets demonstrate the effectiveness of the proposed measure.
引用
收藏
页码:132 / 135
页数:4
相关论文
共 50 条
  • [41] Composite Feature Extraction and Selection for Text Classification
    Wan, Chuan
    Wang, Yuling
    Liu, Yaoze
    Ji, Jinchao
    Feng, Guozhong
    IEEE ACCESS, 2019, 7 : 35208 - 35219
  • [42] Higher order feature selection for text classification
    Bakus, J
    Kamel, MS
    KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 9 (04) : 468 - 491
  • [43] Feature selection for text classification with Naive Bayes
    Chen, Jingnian
    Huang, Houkuan
    Tian, Shengfeng
    Qu, Youli
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 5432 - 5435
  • [44] Optimal Feature Selection for Imbalanced Text Classification
    Khurana A.
    Verma O.P.
    IEEE Transactions on Artificial Intelligence, 2023, 4 (01): : 135 - 147
  • [45] Document Classification with a weighted Frequency Pattern tree algorithm
    Dsouza, Froila Helixia
    Ananthanarayana, V. S.
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON DATA MINING AND ADVANCED COMPUTING (SAPIENCE), 2016, : 29 - 34
  • [46] Interactions between document representation and feature selection in text categorization
    Radovanovic, Milos
    Ivanovic, Mirjana
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2006, 4080 : 489 - 498
  • [47] A COMBINED APPROACH FOR FILTER FEATURE SELECTION IN DOCUMENT CLASSIFICATION
    Le Nguyen Hoai Nam
    Ho Bao Quoc
    2015 IEEE 27TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2015), 2015, : 317 - 324
  • [48] Hybrid Feature Selection for Amharic News Document Classification
    Endalie, Demeke
    Haile, Getamesay
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
  • [49] Feature selection method based on improved document frequency
    Zheng, Wei
    Feng, Guohe
    Telkomnika (Telecommunication Computing Electronics and Control), 2014, 12 (04) : 905 - 910
  • [50] A Weighted Classification Method Based on Adaptive Feature Selection
    Ni, Ruizheng
    Qiu, Ruichang
    Luo, Zhiwei
    Chen, Jie
    Jin, Zheming
    Liu, Zhigang
    IEEE ACCESS, 2022, 10 : 58635 - 58646