Contextual feature selection for text classification

被引:6
|
作者
Paradis, Francois [1 ]
Nie, Jian-Yun [1 ]
机构
[1] Univ Montreal, DIRO, Montreal, PQ, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
classification; named entities; feature selection; text filtering;
D O I
10.1016/j.ipm.2006.07.006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a simple approach for the classification of "noisy" documents using bigrams and named entities. The approach combines conventional feature selection with a contextual approach to filter out passages around selected features. Originally designed for callfor tender documents, the method can be useful for other web collections that also contain non-topical contents. Experiments are conducted on our in-house collection as well as on the 4-Universities data set, Reuters 21578 and 20 Newsgroups. We find a significant improvement on our collection and the 4-Universities data set (10.9% and 4.1%, respectively). Although the best results are obtained by combining bigrams and named entities, the impact of the latter is not found to be significant. (c) 2006 Published by Elsevier Ltd.
引用
收藏
页码:344 / 352
页数:9
相关论文
共 50 条
  • [31] A New Filter Feature Selection Method for Text Classification
    Cekik, Rasim
    IEEE ACCESS, 2024, 12 : 139316 - 139335
  • [32] A Comparative Study on Feature Selection in Unbalance Text Classification
    Xu, Yan
    2012 INTERNATIONAL SYMPOSIUM ON INFORMATION SCIENCE AND ENGINEERING (ISISE), 2012, : 44 - 47
  • [33] Utility-based feature selection for text classification
    Heyong Wang
    Ming Hong
    Raymond Yiu Keung Lau
    Knowledge and Information Systems, 2019, 61 : 197 - 226
  • [34] Feature Selection For Text Classification Using Genetic Algorithms
    Bidi, Noria
    Elberrichi, Zakaria
    PROCEEDINGS OF 2016 8TH INTERNATIONAL CONFERENCE ON MODELLING, IDENTIFICATION & CONTROL (ICMIC 2016), 2016, : 806 - 810
  • [35] Utility-based feature selection for text classification
    Wang, Heyong
    Hong, Ming
    Lau, Raymond Yiu Keung
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 61 (01) : 197 - 226
  • [36] Feature Selection by Using Heuristic Methods for Text Classification
    Sel, Ilhami
    Yeroglu, Celalettin
    Hanbay, Davut
    2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,
  • [37] Two-Stage Feature Selection for Text Classification
    Ozgur, Levent
    Gungor, Tunga
    INFORMATION SCIENCES AND SYSTEMS 2015, 2016, 363 : 329 - 337
  • [38] Comparison of feature selection methods in Kurdish text classification
    Ari M. Saeed
    Soran Badawi
    Sara A. Ahmed
    Diyari A. Hassan
    Iran Journal of Computer Science, 2024, 7 (1) : 55 - 64
  • [39] An application of MOGW optimization for feature selection in text classification
    Razieh Asgarnezhad
    S. Amirhassan Monadjemi
    Mohammadreza Soltanaghaei
    The Journal of Supercomputing, 2021, 77 : 5806 - 5839
  • [40] An improved global feature selection scheme for text classification
    Uysal, Alper Kursat
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 43 : 82 - 92