Detecting phishing e-mails using Text and Data mining

被引:0
|
作者
Pandey, Mayank [1 ]
Ravi, Vadlamani [1 ]
机构
[1] Inst Dev & Res Banking Technol, Hyderabad, Andhra Pradesh, India
关键词
Multilayer Perceptron; Decision Tree; Logistic regression; Support Vector Machine; Group Method Of Data Handling; Phishing webpage; Probabilistic Neural Network; Genetic Programming; Text mining; Classification; ATTACKS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents text and data mining in tandem to detect the phishing email. The study employs Multilayer Perceptron (MLP), Decision Trees (DT), Support Vector Machine (SVM), Group Method of Data Handling (GMDH), Probabilistic Neural Net (PNN), Genetic Programming (GP) and Logistic Regression (LR) for classification. A dataset of 2500 phishing and non phishing emails is analyzed after extracting 23 keywords from the email bodies using text mining from the original dataset. Further, we selected 12 most important features using t-statistic based feature selection. Here, we did not find statistically significant difference in sensitivity as indicated by t-test at 1% level of significance, both with and without feature selection across all techniques except PNN. Since, the GP and DT are not statistically significantly different either with or without feature selection at 1% level of significance, DT should be preferred because it yields 'if-then' rules, thereby increasing the comprehensibility of the system.
引用
收藏
页码:249 / 254
页数:6
相关论文
共 50 条
  • [1] Detecting phishing e-mails by heterogeneous classification
    del Castillo, M. Dolores
    Iglesias, Angel
    Serrano, J. Ignacio
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2007, 2007, 4881 : 296 - 305
  • [2] What Phishing E-mails Reveal: An Exploratory Analysis of Phishing Attempts Using Text Analysis
    O'Leary, Daniel E.
    JOURNAL OF INFORMATION SYSTEMS, 2019, 33 (03) : 285 - 307
  • [3] Phishing Attacks: Detecting and Preventing Infected E-mails Using Machine Learning Methods
    Ona, Diego
    Zapata, Lenin
    Fuertes, Walter
    Rodriguez, German
    Benavides, Eduardo
    Toulkeridis, Theofilos
    2019 3RD CYBER SECURITY IN NETWORKING CONFERENCE (CSNET), 2019,
  • [4] An integrated approach to filtering phishing e-mails
    del Castillo, M. Dolores
    Iglesias, Angel
    Serrano, J. Ignacio
    COMPUTER AIDED SYSTEMS THEORY- EUROCAST 2007, 2007, 4739 : 321 - 328
  • [5] Classifying and identifying of threats in E-mails - Using data mining techniques
    Shekar, D. V. Chandra
    Imambi, S. Sagar
    IMECS 2008: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2008, : 562 - 566
  • [6] Improved text mining methods to answer Chinese e-mails automatically
    Lv, Yingjie
    Ye, Qiang
    Li, Yijun
    INTELLIGENT CONTROL AND AUTOMATION, 2006, 344 : 894 - 902
  • [7] Using text classification and multiple concepts to answer e-mails
    Weng, SS
    Liu, CK
    EXPERT SYSTEMS WITH APPLICATIONS, 2004, 26 (04) : 529 - 543
  • [8] Exposing the Phish: The Effect of Persuasion Techniques in Phishing E-Mails
    Koddebusch, Michael
    PROCEEDINGS OF THE 23RD ANNUAL INTERNATIONAL CONFERENCE ON DIGITAL GOVERNMENT RESEARCH, DGO 2022: Intelligent Technologies, Governments and Citizens, 2022, : 78 - 87
  • [9] Smoking e-mails
    Novack, J
    FORBES, 2005, 176 (06): : 58 - +