A Comparative Study on Feature Window Selection in Text Filtering

被引:0
|
作者
Hu Quan [1 ]
Xie Fang [2 ]
Liu Xiaoguang [3 ]
机构
[1] Huazhong Normal Univ, Coll Phys Sci & Technol, Wuhan 430079, Peoples R China
[2] Hubei Univ Technol, Coll Comp Sci, Wuhan 430068, Peoples R China
[3] Nankai Univ, Coll Informat Technol, Tianjin 300071, Peoples R China
关键词
text filtering; feature vector; feature window; matching algorithm;
D O I
10.1109/IFITA.2009.189
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text representation is a preliminary step to text filtering, while VSM is the most commonly used method in this field However, the document feature set, which produced by VSM, usually has a very high dimensionality. As a result, the distribution of feature value tends to be highly skewed In this paper some new mechanisms are presented to abate such problems. Using these mechanisms, document features are extracted from some smaller feature windows rather than a full text, such as sentences, graphs and blocks, and the correlative texts are finally evaluated by local similarity. They are gotten by the analysis of document's linguistics structures in documents. As a result, it can give a remarkable effect on the precision of text filtering.
引用
收藏
页码:209 / +
页数:2
相关论文
共 50 条
  • [21] A comparative study on feature weight in text categorization
    Deng, ZH
    Tang, SW
    Yang, DQ
    Zhang, M
    Li, LY
    Xie, KQ
    ADVANCED WEB TECHNOLOGIES AND APPLICATIONS, 2004, 3007 : 588 - 597
  • [22] A comparative study on feature weight in text categorization
    Deng, Zhi-Hong
    Tang, Shi-Wei
    Yang, Dong-Qing
    Zhang, Ming
    Li, Li-Yu
    Xie, Kun-Qing
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2004, 3007 : 588 - 597
  • [23] A parallel feature selection method study for text classification
    Li, Zhao
    Lu, Wei
    Sun, Zhanquan
    Xing, Weiwei
    NEURAL COMPUTING & APPLICATIONS, 2017, 28 : S513 - S524
  • [24] An Experimental Study of Feature Selection Methods for Text Classification
    Uchyigit, Gulden
    Clark, Keith
    PERSONALIZATION TECHNIQUES AND RECOMMENDER SYSTEMS, 2008, : 303 - 320
  • [25] A parallel feature selection method study for text classification
    Zhao Li
    Wei Lu
    Zhanquan Sun
    Weiwei Xing
    Neural Computing and Applications, 2017, 28 : 513 - 524
  • [26] An extensive empirical study of feature selection for text categorization
    Qiu, Li-Qing
    Zhao, Ru-Yi
    Zhou, Gang
    Yi, Sheng-Wei
    7TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE IN CONJUNCTION WITH 2ND IEEE/ACIS INTERNATIONAL WORKSHOP ON E-ACTIVITY, PROCEEDINGS, 2008, : 312 - 315
  • [27] Feature selection for spam filtering
    Menghour, Kamilia
    Souici-Meslati, Labiba
    CORIA 2010: Actes de la COnference en Recherche d'Information et Applications - Proceedings of the Conference on Information Retrieval and Applications, 2010, : 349 - 360
  • [28] Feature selection for airborne LiDAR data filtering: a mutual information method with Parzon window optimization
    Cai, Zhan
    Ma, Hongchao
    Zhang, Liang
    GISCIENCE & REMOTE SENSING, 2020, 57 (03) : 323 - 337
  • [29] Support vector machines based arabic language text classification system: Feature selection comparative study
    Mesleh, Abdelwadood
    APPLIED MATHEMATICS FOR SCIENCE AND ENGINEERING, 2007, : 228 - +
  • [30] Support Vector Machines Based Arabic Language Text Classification System: Feature Selection Comparative Study
    Mesleh, Abdelwadood Moh'd
    ADVANCES IN COMPUTER AND INFORMATIOM SCIENCES AND ENGINEERING, 2008, : 11 - 16