Compression-based spam filter

被引:6
|
作者
Almeida, Tiago A. [1 ]
Yamakami, Akebo [2 ]
机构
[1] Fed Univ Sao Carlos UFSCar, Dept Comp Sci, BR-18052780 Sorocaba, SP, Brazil
[2] Univ Campinas UNICAMP, Sch Elect & Comp Engn, BR-13083970 Campinas, SP, Brazil
基金
巴西圣保罗研究基金会;
关键词
compression-based model; spam filter; text categorization; knowledge-based system; machine learning; CLASSIFICATION;
D O I
10.1002/sec.639
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, e-mail spam is not a novelty, but it is still an important problem with a high impact on the economy. Spam filtering poses a special problem in text categorization, in which the defining characteristic is that filters face an active adversary, which constantly attempts to evade filtering. In this paper, we present a novel approach to spam filtering based on a compression-based model. We have conducted an empirical experiment on eight public and real non-encoded datasets. The results indicate that the proposed filter is fast to construct, is incrementally updateable, and clearly outperforms established spam classifiers. Copyright (c) 2012 John Wiley & Sons, Ltd.
引用
收藏
页码:327 / 335
页数:9
相关论文
共 50 条
  • [21] OCTEN: ONLINE COMPRESSION-BASED TENSOR DECOMPOSITION
    Gujral, Ekta
    Pasricha, Ravdeep
    Yang, Tianxiong
    Papalexakis, Evangelos E.
    2019 IEEE 8TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL ADVANCES IN MULTI-SENSOR ADAPTIVE PROCESSING (CAMSAP 2019), 2019, : 455 - 459
  • [22] Compression-based Modelling of Musical Similarity Perception
    Pearce, Marcus
    Mullensiefen, Daniel
    JOURNAL OF NEW MUSIC RESEARCH, 2017, 46 (02) : 135 - 155
  • [23] Compression-based inference of network motif sets
    Benichou, Alexis
    Masson, Jean-Baptiste
    Vestergaard, Christian L.
    PLOS COMPUTATIONAL BIOLOGY, 2024, 20 (10)
  • [24] A compression-based algorithm for Chinese word segmentation
    Teahan, WJ
    Wen, YY
    McNab, R
    Witten, IH
    COMPUTATIONAL LINGUISTICS, 2000, 26 (03) : 375 - 393
  • [25] Compression-based hierarchical clustering of SAR images
    Cerra, Daniele
    Datcu, Mihai
    REMOTE SENSING LETTERS, 2010, 1 (03) : 141 - 147
  • [26] Compression-Based Measures for Mining Interesting Rules
    Suzuki, Einoshin
    NEXT-GENERATION APPLIED INTELLIGENCE, PROCEEDINGS, 2009, 5579 : 741 - 746
  • [27] COMPRESSION-BASED GEOMETRIC PATTERN DISCOVERY IN MUSIC
    Meredith, David
    2014 4TH INTERNATIONAL WORKSHOP ON COGNITIVE INFORMATION PROCESSING (CIP), 2014,
  • [28] A compression-Based Technique for Comparing Biological Sequences
    Mina, Ramez
    Ali, Hesham H.
    2010 5TH CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE (CIBEC 2010), 2010, : 94 - 97
  • [29] Compression-Based Regularization With an Application to Multitask Learning
    Vera, Matias
    Vega, Leonardo Rey
    Piantanida, Pablo
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2018, 12 (05) : 1063 - 1076
  • [30] Influence of music representation on compression-based clustering
    Gonzalez-Pardo, Antonio
    Granados, Ana
    Camacho, David
    de Borja Rodrigues, Francisco
    2010 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2010,