Applying lazy learning algorithms to tackle concept drift in spam filtering

被引:68
|
作者
Fdez-Riverola, F.
Iglesias, E. L.
Diaz, F.
Mendez, J. R.
Corchado, J. M.
机构
[1] Univ Vigo, Dept Informat, Escuela Super Ingn Informat, Orense 32004, Spain
[2] Univ Valladolid, Escuela Univ Informat, Dept Informat, Segovia 40005, Spain
[3] Univ Salamanca, Dept Informat & Automat, E-37008 Salamanca, Spain
关键词
IBR system; concept drift; anti-spam filtering; model evaluation;
D O I
10.1016/j.eswa.2006.04.011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A great amount of machine learning techniques have been applied to problems where data is collected over an extended period of time. However, the disadvantage with many real-world applications is that the distribution underlying the data is likely to change over time. In these situations, a problem that many global eager learners face is their inability to adapt to local concept drift. Concept drift in spam is particularly difficult as the spammers actively change the nature of their messages to elude spam filters. Algorithms that track concept drift must be able to identify a change in the target concept (spam or legitimate e-mails) without direct knowledge of the underlying shift in distribution. In this paper we show how a previously successful instance-based reasoning e-mail filtering model can be improved in order to better track concept drift in spam domain. Our proposal is based on the definition of two complementary techniques able to select both terms and e-mails representative of the current situation. The enhanced system is evaluated against other well-known successful lazy learning approaches in two scenarios, all within a cost-sensitive framework. The results obtained from the experiments carried out are very promising and back up the idea that instance-based reasoning systems can offer a number of advantages tackling concept drift in dynamic problems, as in the case of the anti-spam filtering domain. (c) 2006 Elsevier Ltd. All rights reserved.
引用
收藏
页码:36 / 48
页数:13
相关论文
共 50 条
  • [21] A survey of machine learning techniques for Spam filtering
    Saad, Omar
    Darwish, Ashraf
    Faraj, Ramadan
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2012, 12 (02): : 66 - 73
  • [22] A Survey of Machine Learning Techniques for Spam Filtering
    Saad, Omar
    Hassanien, Aboul Ella
    Darwish, Ashraf
    Faraj, Ramadan
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2013, 13 (01): : 103 - 110
  • [23] Spam Filtering based on Knowledge Transfer Learning
    Wang, Xing
    Fang, Bin-Xing
    He, Hui
    Zhang, Hong-Li
    INTERNATIONAL JOURNAL OF SECURITY AND ITS APPLICATIONS, 2015, 9 (10): : 341 - 352
  • [24] Structured ensemble learning for email spam filtering
    Liu, W. (wyliu@nudt.edu.cn), 2012, Science Press (49):
  • [25] Active Learning based Spam Filtering Method
    Zhang, Wei
    Gao, Feng
    Lv, Di
    Xue, Feng
    2010 8TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2010, : 3302 - 3306
  • [26] On the consistency of information filters for lazy learning algorithms
    Brighton, H
    Mellish, C
    PRINCIPLES OF DATA MINING AND KNOWLEDGE DISCOVERY, 1999, 1704 : 283 - 288
  • [27] Comparison of Machine Learning Algorithms for Spam Detection
    Sadia, Azeema
    Bashir, Fatima
    Khan, Reema Qaiser
    Bashir, Amna
    Khalid, Ammarah
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2023, 14 (02) : 178 - 184
  • [28] ACTIVE MULTI-FIELD LEARNING FOR SPAM FILTERING
    Liu, Wuying
    Wang, Lin
    Yi, Mianzhu
    Xie, Nan
    COMPUTING AND INFORMATICS, 2014, 33 (06) : 1400 - 1427
  • [29] Multi-field Learning for Email Spam Filtering
    Liu, Wuying
    Wang, Ting
    SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 745 - 746
  • [30] Collaborative spam filtering based on incremental ontology learning
    Pham, Xuan Hau
    Lee, Nam-Hee
    Jung, Jason J.
    Sadeghi-Niaraki, Abolghasem
    TELECOMMUNICATION SYSTEMS, 2013, 52 (02) : 693 - 700