Contextual feature selection for text classification

被引:6
|
作者
Paradis, Francois [1 ]
Nie, Jian-Yun [1 ]
机构
[1] Univ Montreal, DIRO, Montreal, PQ, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
classification; named entities; feature selection; text filtering;
D O I
10.1016/j.ipm.2006.07.006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a simple approach for the classification of "noisy" documents using bigrams and named entities. The approach combines conventional feature selection with a contextual approach to filter out passages around selected features. Originally designed for callfor tender documents, the method can be useful for other web collections that also contain non-topical contents. Experiments are conducted on our in-house collection as well as on the 4-Universities data set, Reuters 21578 and 20 Newsgroups. We find a significant improvement on our collection and the 4-Universities data set (10.9% and 4.1%, respectively). Although the best results are obtained by combining bigrams and named entities, the impact of the latter is not found to be significant. (c) 2006 Published by Elsevier Ltd.
引用
收藏
页码:344 / 352
页数:9
相关论文
共 50 条
  • [21] Composite Feature Extraction and Selection for Text Classification
    Wan, Chuan
    Wang, Yuling
    Liu, Yaoze
    Ji, Jinchao
    Feng, Guozhong
    IEEE ACCESS, 2019, 7 : 35208 - 35219
  • [22] Higher order feature selection for text classification
    Jan Bakus
    Mohamed S. Kamel
    Knowledge and Information Systems, 2006, 9 : 468 - 491
  • [23] Optimal Feature Selection for Imbalanced Text Classification
    Khurana A.
    Verma O.P.
    IEEE Transactions on Artificial Intelligence, 2023, 4 (01): : 135 - 147
  • [24] Higher order feature selection for text classification
    Bakus, J
    Kamel, MS
    KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 9 (04) : 468 - 491
  • [25] Feature selection for text classification with Naive Bayes
    Chen, Jingnian
    Huang, Houkuan
    Tian, Shengfeng
    Qu, Youli
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 5432 - 5435
  • [26] Effective Text Classification by a Supervised Feature Selection Approach
    Basu, Tanmay
    Murthy, C. A.
    12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012), 2012, : 918 - 925
  • [27] A feature selection algorithm with redundancy reduction for text classification
    Saleh, Sherine Nagi
    El-Sonbaty, Yasser
    2007 22ND INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2007, : 130 - +
  • [28] Two new feature selection metrics for text classification
    Sahin, Durmus Ozkan
    Kilic, Erdal
    AUTOMATIKA, 2019, 60 (02) : 162 - 171
  • [29] An application of MOGW optimization for feature selection in text classification
    Asgarnezhad, Razieh
    Monadjemi, S. Amirhassan
    Soltanaghaei, Mohammadreza
    JOURNAL OF SUPERCOMPUTING, 2021, 77 (06): : 5806 - 5839
  • [30] Feature selection in text classification via SVM and LSI
    Wang, Ziqiang
    Zhang, Dexian
    ADVANCES IN NEURAL NETWORKS - ISNN 2006, PT 1, 2006, 3971 : 1381 - 1386