Textual case-based reasoning for spam filtering: a comparison of feature-based and feature-free approaches

被引:6
|
作者
Delany, Sarah Jane [2 ]
Bridge, Derek [1 ]
机构
[1] Univ Coll Cork, Cork, Ireland
[2] Dublin Inst Technol, Dublin, Ireland
关键词
spam filtering; case-based reasoning; case-base editing; case-based maintenance; feature selection; distance measures; text compression;
D O I
10.1007/s10462-007-9041-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spam filtering is a text classification task to which Case-Based Reasoning (CBR) has been successfully applied. We describe the ECUE system, which classifies emails using a feature-based form of textual CBR. Then, we describe an alternative way to compute the distances between cases in a feature-free fashion, using a distance measure based on text compression. This distance measure has the advantages of having no set-up costs and being resilient to concept drift. We report an empirical comparison, which shows the feature-free approach to be more accurate than the feature-based system. These results are fairly robust over different compression algorithms in that we find that the accuracy when using a Lempel-Ziv compressor (GZip) is approximately the same as when using a statistical compressor (PPM). We note, however, that the feature-free systems take much longer to classify emails than the feature-based system. Improvements in the classification time of both kinds of systems can be obtained by applying case base editing algorithms, which aim to remove noisy and redundant cases from a case base while maintaining, or even improving, generalisation accuracy. We report empirical results using the Competence-Based Editing (CBE) technique. We show that CBE removes more cases when we use the distance measure based on text compression (without significant changes in generalisation accuracy) than it does when we use the feature-based approach.
引用
收藏
页码:75 / 87
页数:13
相关论文
共 50 条
  • [1] Textual case-based reasoning for spam filtering: a comparison of feature-based and feature-free approaches
    Sarah Jane Delany
    Derek Bridge
    Artificial Intelligence Review, 2006, 26 : 75 - 87
  • [2] Catching the drift: Using feature-free case-based reasoning for spam filtering
    Delany, Sarah Jane
    Bridge, Derek
    CASE-BASED REASONING RESEARCH AND DEVELOPMENT, PROCEEDINGS, 2007, 4626 : 314 - +
  • [3] Automated Algorithm Selection: from Feature-Based to Feature-Free Approaches
    Alissa, Mohamad
    Sim, Kevin
    Hart, Emma
    JOURNAL OF HEURISTICS, 2023, 29 (01) : 1 - 38
  • [4] Automated Algorithm Selection: from Feature-Based to Feature-Free Approaches
    Mohamad Alissa
    Kevin Sim
    Emma Hart
    Journal of Heuristics, 2023, 29 : 1 - 38
  • [5] Relaxing feature selection in spam filtering by using case-based reasoning systems
    Mendez, J. R.
    Fdez-Riverola, F.
    Glez-Pena, D.
    Diaz, F.
    Corchado, J. M.
    PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2007, 4874 : 53 - +
  • [6] An Assessment of Case-Based Reasoning for Spam Filtering
    Sarah Jane Delany
    Pádraig Cunningham
    Lorcan Coyle
    Artificial Intelligence Review, 2005, 24 : 359 - 378
  • [7] An assessment of case-based reasoning for spam filtering
    Delany, SJ
    Cunningham, P
    Coyle, L
    ARTIFICIAL INTELLIGENCE REVIEW, 2005, 24 (3-4) : 359 - 378
  • [8] Macro and micro applications of case-based reasoning to feature-based product selection
    Saward, G
    O'Dell, T
    RESEARCH AND DEVELOPMENT IN INTELLIGENT SYSTEMS XVII, 2001, : 102 - 114
  • [9] Case-Based Reasoning with Feature Clustering
    Hong, Tzung-Pei
    Liou, Yan-Liang
    PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS, 2008, : 449 - +
  • [10] Rough set based approaches to feature selection for Case-Based Reasoning classifiers
    Salamo, Maria
    Lopez-Sanchez, Maite
    PATTERN RECOGNITION LETTERS, 2011, 32 (02) : 280 - 292