Cost-Sensitive Spam Detection Using Parameters Optimization and Feature Selection

被引:0
|
作者
Lee, Sang Min [1 ]
Kim, Dong Seong [2 ]
Park, Jong Sou [1 ]
机构
[1] Korea Aerosp Univ, Dept Comp Engn, Seoul, South Korea
[2] Duke Univ, Dept Elect & Comp Engn, Durham, NC 27706 USA
关键词
Feature Selection; Intrusion Detection; Parameters Optimization; Random Forests; Spam Detection; Spambase;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
E-mail spam is no more garbage but risk since it recently includes virus attachments and spyware agents which make the recipients' system ruined, therefore, there is an emerging need for spam detection. Many spam detection techniques based on machine learning techniques have been proposed. As the amount of spam has been increased tremendously using bulk mailing tools, spam detection techniques should counteract with it. To cope with this, parameters optimization and feature selection have been used to reduce processing overheads while guaranteeing high detection rates. However, previous approaches have not taken into account feature variable importance and optimal number of features. Moreover, to the best of our knowledge, there is no approach which uses both parameters optimization and feature selection together for spam detection. In this paper, we propose a spam detection model enabling both parameters optimization and optimal feature selection; we optimize two parameters of detection models using Random Forests (RF) so as to maximize the detection rates. We provide the variable importance of each feature so that it is easy to eliminate the irrelevant features. Furthermore, we decide an optimal number of selected features using two methods; (i) only one parameters optimization during overall feature selection and (ii) parameters optimization in every feature elimination phase. Finally, we evaluate our spam detection model with cost-sensitive measures to avoid misclassification of legitimate messages, since the cost of classifying a legitimate message as a spam far outweighs the cost of classifying a spam as a legitimate message. We perform experiments on Spambase dataset and show the feasibility of our approaches.
引用
收藏
页码:944 / 960
页数:17
相关论文
共 50 条
  • [1] Spam Detection Using Feature Selection and Parameters Optimization
    Lee, Sang Min
    Kim, Dong Seong
    Kim, Ji Ho
    Park, Jong Sou
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT AND SOFTWARE INTENSIVE SYSTEMS (CISIS 2010), 2010, : 883 - 888
  • [2] Cost-sensitive feature selection based on Adaptive Hunting Optimization
    Liang, Yixuan
    2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE, CCAI 2024, 2024, : 546 - 551
  • [3] Spam detection on social networks using cost-sensitive feature selection and ensemble-based regularized deep neural networks
    Aliaksandr Barushka
    Petr Hajek
    Neural Computing and Applications, 2020, 32 : 4239 - 4257
  • [4] Spam detection on social networks using cost-sensitive feature selection and ensemble-based regularized deep neural networks
    Barushka, Aliaksandr
    Hajek, Petr
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (09): : 4239 - 4257
  • [5] Cost-Sensitive Feature Selection on Heterogeneous Data
    Qian, Wenbin
    Shu, Wenhao
    Yang, Jun
    Wang, Yinglong
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PART II, 2015, 9078 : 397 - 408
  • [6] Cost-sensitive and sequential feature selection for chiller fault detection and diagnosis
    Yan, Ke
    Ma, Lulu
    Dai, Yuting
    Shen, Wen
    Ji, Zhiwei
    Xie, Dongqing
    INTERNATIONAL JOURNAL OF REFRIGERATION, 2018, 86 : 401 - 409
  • [7] Enhanced Detection of Text and Image Spam Using Cost-Sensitive Deep Learning
    Mallampati, Deepika
    Hegde, Nagaratna P.
    TRAITEMENT DU SIGNAL, 2024, 41 (03) : 1283 - 1292
  • [8] Cost-Sensitive Feature Selection using Particle Swarm Optimization: A Post-Processing Approach
    Ali, Syed Imran
    Khan, Wajahat Ali
    Lee, Sungyoung
    Lee, Sang-Ho
    2020 34TH INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING (ICOIN 2020), 2020, : 97 - 101
  • [9] Cost-Sensitive Feature Selection via F-Measure Optimization Reduction
    Liu, Meng
    Xu, Chang
    Luo, Yong
    Xu, Chao
    Wen, Yonggang
    Tao, Dacheng
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2252 - 2258
  • [10] Cost-Sensitive Feature Selection for Class Imbalance Problem
    Bach, Malgorzata
    Werner, Aleksandra
    INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, PT I, 2018, 655 : 182 - 194