Applying lazy learning algorithms to tackle concept drift in spam filtering

被引:68
|
作者
Fdez-Riverola, F.
Iglesias, E. L.
Diaz, F.
Mendez, J. R.
Corchado, J. M.
机构
[1] Univ Vigo, Dept Informat, Escuela Super Ingn Informat, Orense 32004, Spain
[2] Univ Valladolid, Escuela Univ Informat, Dept Informat, Segovia 40005, Spain
[3] Univ Salamanca, Dept Informat & Automat, E-37008 Salamanca, Spain
关键词
IBR system; concept drift; anti-spam filtering; model evaluation;
D O I
10.1016/j.eswa.2006.04.011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A great amount of machine learning techniques have been applied to problems where data is collected over an extended period of time. However, the disadvantage with many real-world applications is that the distribution underlying the data is likely to change over time. In these situations, a problem that many global eager learners face is their inability to adapt to local concept drift. Concept drift in spam is particularly difficult as the spammers actively change the nature of their messages to elude spam filters. Algorithms that track concept drift must be able to identify a change in the target concept (spam or legitimate e-mails) without direct knowledge of the underlying shift in distribution. In this paper we show how a previously successful instance-based reasoning e-mail filtering model can be improved in order to better track concept drift in spam domain. Our proposal is based on the definition of two complementary techniques able to select both terms and e-mails representative of the current situation. The enhanced system is evaluated against other well-known successful lazy learning approaches in two scenarios, all within a cost-sensitive framework. The results obtained from the experiments carried out are very promising and back up the idea that instance-based reasoning systems can offer a number of advantages tackling concept drift in dynamic problems, as in the case of the anti-spam filtering domain. (c) 2006 Elsevier Ltd. All rights reserved.
引用
收藏
页码:36 / 48
页数:13
相关论文
共 50 条
  • [41] Machine intelligence-based algorithms for spam filtering on document labeling
    Gaurav, Devottam
    Tiwari, Sanju Mishra
    Goyal, Ayush
    Gandhi, Niketa
    Abraham, Ajith
    SOFT COMPUTING, 2020, 24 (13) : 9625 - 9638
  • [42] Asymmetric Self-Learning for Tackling Twitter Spam Drift
    Chen, Chao
    Zhang, Jun
    Xiang, Yang
    Zhou, Wanlei
    2015 IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2015, : 208 - 213
  • [43] Applying Concept Drift to Understand Hepatitis Evolution in Brazil
    Rios, Ricardo A.
    Rios, Tatiane N.
    Melo, Rosemary
    de Santana, Euler Santos
    Santos Carneiro, Tecia Maria
    D'Oliveira Junior, Argemiro
    CYBERNETICS AND SYSTEMS, 2020, 51 (06) : 631 - 645
  • [44] Analysis of the Evolution of Features in Classification Problems with Concept Drift: Application to Spam Detection
    Henke, Marcia
    Souto, Eduardo
    dos Santos, Eulanda M.
    PROCEEDINGS OF THE 2015 IFIP/IEEE INTERNATIONAL SYMPOSIUM ON INTEGRATED NETWORK MANAGEMENT (IM), 2015, : 874 - 877
  • [45] A Time-Sensitive Spam Filter Algorithm Dealing with Concept-drift
    Liu, Jiaolong
    PROCEEDINGS OF THE 2016 4TH INTERNATIONAL CONFERENCE ON MACHINERY, MATERIALS AND COMPUTING TECHNOLOGY, 2016, 60 : 1264 - 1269
  • [46] Machine and Deep Learning Algorithms for Twitter Spam Detection
    Alsaffar, Dalia
    Alfahhad, Amjad
    Alqhtani, Bashaier
    Alamri, Lama
    Alansari, Shahad
    Alqahtani, Nada
    Alboaneen, Dabiah A.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2019, 2020, 1058 : 483 - 491
  • [47] Comparison of Deep and Traditional Learning Methods for Email Spam Filtering
    Sheneamer, Abdullah
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (01) : 560 - 565
  • [48] Spam Filtering: an Active Learning Approach using Incremental Clustering
    Georgala, Kleanthi
    Kosmopoulos, Aris
    Paliouras, George
    4TH INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, MINING AND SEMANTICS, 2014,
  • [49] A survey of learning-based techniques of email spam filtering
    Enrico Blanzieri
    Anton Bryl
    Artificial Intelligence Review, 2008, 29 : 63 - 92
  • [50] An empirical study of three machine learning methods for spam filtering
    Lai, Chih-Chin
    KNOWLEDGE-BASED SYSTEMS, 2007, 20 (03) : 249 - 254