Detecting phishing e-mails using Text and Data mining

被引:0
|
作者
Pandey, Mayank [1 ]
Ravi, Vadlamani [1 ]
机构
[1] Inst Dev & Res Banking Technol, Hyderabad, Andhra Pradesh, India
关键词
Multilayer Perceptron; Decision Tree; Logistic regression; Support Vector Machine; Group Method Of Data Handling; Phishing webpage; Probabilistic Neural Network; Genetic Programming; Text mining; Classification; ATTACKS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents text and data mining in tandem to detect the phishing email. The study employs Multilayer Perceptron (MLP), Decision Trees (DT), Support Vector Machine (SVM), Group Method of Data Handling (GMDH), Probabilistic Neural Net (PNN), Genetic Programming (GP) and Logistic Regression (LR) for classification. A dataset of 2500 phishing and non phishing emails is analyzed after extracting 23 keywords from the email bodies using text mining from the original dataset. Further, we selected 12 most important features using t-statistic based feature selection. Here, we did not find statistically significant difference in sensitivity as indicated by t-test at 1% level of significance, both with and without feature selection across all techniques except PNN. Since, the GP and DT are not statistically significantly different either with or without feature selection at 1% level of significance, DT should be preferred because it yields 'if-then' rules, thereby increasing the comprehensibility of the system.
引用
收藏
页码:249 / 254
页数:6
相关论文
共 50 条
  • [31] Yucca Mountain e-mails indicate data were falsified
    Dawson, J
    PHYSICS TODAY, 2005, 58 (05) : 32 - 32
  • [32] Climate e-mails: lack of data sharing is a real concern
    Bell, David R.
    NATURE, 2010, 463 (7277) : 25 - 25
  • [33] E-mails spark ethics row
    Rex Dalton
    Nature, 2010, 466 : 913 - 913
  • [34] The Immortal Life of the Enron E-mails
    Leber, Jessica
    TECHNOLOGY REVIEW, 2013, 116 (05) : 15 - 16
  • [35] Dealing with women and e-mails at the office
    Streeruwitz, M
    DU-DIE ZEITSCHRIFT DER KULTUR, 2002, (7-8): : 15 - 15
  • [36] Gender Identification from E-mails
    Cheng, Na
    Chen, Xiaoling
    Chandramouli, R.
    Subbalakshmi, K. P.
    2009 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING, 2009, : 154 - 158
  • [37] Yucca Mountain e-mails reviewed
    Lubick, N
    GEOTIMES, 2006, 51 (05): : 15 - 15
  • [38] E-mails spark ethics row
    Dalton, Rex
    NATURE, 2010, 466 (7309) : 913 - 913
  • [39] Unsolicited E-mails to Forensic Psychiatrists
    Friedman, Susan Hatters
    Appel, Jacob M.
    Ash, Peter
    Frierson, Richard L.
    Giorgi-Guarnieri, Deborah
    Martinez, Richard
    Newman, Alan W.
    Pinals, Debra A.
    Resnick, Phillip J.
    Simpson, Alexander I. F.
    JOURNAL OF THE AMERICAN ACADEMY OF PSYCHIATRY AND THE LAW, 2016, 44 (04): : 470 - 478
  • [40] Who needs to reply to e-mails?
    Tole, Shubha
    CURRENT SCIENCE, 2012, 102 (10): : 1347 - 1348