Cost-Sensitive Spam Detection Using Parameters Optimization and Feature Selection

被引:0
|
作者
Lee, Sang Min [1 ]
Kim, Dong Seong [2 ]
Park, Jong Sou [1 ]
机构
[1] Korea Aerosp Univ, Dept Comp Engn, Seoul, South Korea
[2] Duke Univ, Dept Elect & Comp Engn, Durham, NC 27706 USA
关键词
Feature Selection; Intrusion Detection; Parameters Optimization; Random Forests; Spam Detection; Spambase;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
E-mail spam is no more garbage but risk since it recently includes virus attachments and spyware agents which make the recipients' system ruined, therefore, there is an emerging need for spam detection. Many spam detection techniques based on machine learning techniques have been proposed. As the amount of spam has been increased tremendously using bulk mailing tools, spam detection techniques should counteract with it. To cope with this, parameters optimization and feature selection have been used to reduce processing overheads while guaranteeing high detection rates. However, previous approaches have not taken into account feature variable importance and optimal number of features. Moreover, to the best of our knowledge, there is no approach which uses both parameters optimization and feature selection together for spam detection. In this paper, we propose a spam detection model enabling both parameters optimization and optimal feature selection; we optimize two parameters of detection models using Random Forests (RF) so as to maximize the detection rates. We provide the variable importance of each feature so that it is easy to eliminate the irrelevant features. Furthermore, we decide an optimal number of selected features using two methods; (i) only one parameters optimization during overall feature selection and (ii) parameters optimization in every feature elimination phase. Finally, we evaluate our spam detection model with cost-sensitive measures to avoid misclassification of legitimate messages, since the cost of classifying a legitimate message as a spam far outweighs the cost of classifying a spam as a legitimate message. We perform experiments on Spambase dataset and show the feasibility of our approaches.
引用
收藏
页码:944 / 960
页数:17
相关论文
共 50 条
  • [21] Using Cost-Sensitive Learning and Feature Selection Algorithms to Improve the Performance of Imbalanced Classification
    Feng, Fang
    Li, Kuan-Ching
    Shen, Jun
    Zhou, Qingguo
    Yang, Xuhui
    IEEE ACCESS, 2020, 8 : 69979 - 69996
  • [22] Rough sets and Laplacian score based cost-sensitive feature selection
    Yu, Shenglong
    Zhao, Hong
    PLOS ONE, 2018, 13 (06):
  • [23] COST-SENSITIVE FEATURE SELECTION BASED ON LABEL SIGNIFICANCE AND POSITIVE REGION
    Huang, Jintao
    Qian, Wenbin
    Wu, Binglong
    Wang, Yinglong
    PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), 2019, : 403 - 409
  • [24] A Cost-Sensitive Feature Selection Method for High-Dimensional Data
    An, Chaojie
    Zhou, Qifeng
    14TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND EDUCATION (ICCSE 2019), 2019, : 1089 - 1094
  • [25] Sequential Cost-Sensitive Feature Acquisition
    Contardo, Gabriella
    Denoyer, Ludovic
    Artieres, Thierry
    ADVANCES IN INTELLIGENT DATA ANALYSIS XV, 2016, 9897 : 284 - 294
  • [26] Experiments with cost-sensitive feature evaluation
    Robnik-Sikonja, M
    MACHINE LEARNING: ECML 2003, 2003, 2837 : 325 - 336
  • [27] An Efficient Cost-Sensitive Feature Selection Using Chaos Genetic Algorithm for Class Imbalance Problem
    Bian, Jing
    Peng, Xin-guang
    Wang, Ying
    Zhang, Hai
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2016, 2016
  • [28] Cost-sensitive feature acquisition and classification
    Ji, Shihao
    Carin, Lawrence
    PATTERN RECOGNITION, 2007, 40 (05) : 1474 - 1485
  • [29] Cost-sensitive feature selection using random forest: Selecting low-cost subsets of informative features
    Zhou, Qifeng
    Zhou, Hao
    Li, Tao
    KNOWLEDGE-BASED SYSTEMS, 2016, 95 : 1 - 11
  • [30] Applying cost-sensitive multiobjective genetic programming to feature extraction for spam e-mail filtering
    Zhang, Yang
    Li, HongYu
    Niranjan, Mahesan
    Rockettl, Peter
    GENETIC PROGRAMMING, PROCEEDINGS, 2008, 4971 : 325 - +