Imbalanced Learning in Massive Phishing Datasets

被引:3
|
作者
Azari, Ali [1 ]
Namayanja, Josephine M. [2 ]
Kaur, Navneet [1 ]
Misal, Vasundhara [1 ]
Shukla, Suraksha [1 ]
机构
[1] Univ Maryland Baltimore Cty, Dept Informat Syst, Baltimore, MD 21250 USA
[2] Univ Massachusetts, Dept Management Sci & Informat Syst, Boston, MA USA
关键词
big data; classification; ensemble learning; imbalanced learning; phishing; DECISION TREE;
D O I
10.1109/BigDataSecurity-HPSC-IDS49724.2020.00032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Phishing is one of the major threats facing internet users in today's work. Such attacks continue costing billions of dollars to companies around the words thus requiring more efficient detection techniques to curb the danger. This paper proposes a big data friendly implementation of Multiclass Imbalance Learning in Ensembles through Selective Sampling (MILES) that detects phishing attacks with high accuracy. The proposed method is compatible with SPARK, can be trained on a cluster of nodes in parallel, thus reduce the training time by increasing the size of the cluster. In addition, a comparative study of classic machine learning techniques like Random Forest, Naive Bayes, and Decision Trees show that the proposed MILES method provides significantly higher precision and recall.
引用
收藏
页码:127 / 132
页数:6
相关论文
共 50 条
  • [31] Combining integrated sampling with SVM ensembles for learning from imbalanced datasets
    Liu, Yang
    Yu, Xiaohui
    Huang, Jimmy Xiangji
    An, Aijun
    INFORMATION PROCESSING & MANAGEMENT, 2011, 47 (04) : 617 - 631
  • [32] Detection of Cancer Patients Using an Innovative Method for Learning at Imbalanced Datasets
    Parvin, Hamid
    Minaei-Bidgoli, Behrouz
    Alizadeh, Hosein
    ROUGH SETS AND KNOWLEDGE TECHNOLOGY, 2011, 6954 : 376 - 381
  • [33] A Comparative Analysis of Convergence Rate for Imbalanced Datasets of Active Learning Models
    Zhang, Haoke
    Wu, Wanqing
    Pirbhulal, Sandeep
    Li, Guanglin
    Zhang, Hongyi
    2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
  • [34] Uncertainty-Aware Learning Against Label Noise on Imbalanced Datasets
    Huang, Yingsong
    Bai, Bing
    Zhao, Shengwei
    Bai, Kun
    Wang, Fei
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6960 - 6969
  • [35] Effect of Imbalanced Datasets on Security of Industrial IoT Using Machine Learning
    Zolanvari, Maede
    Teixeira, Marcio A.
    Jain, Raj
    2018 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS (ISI), 2018, : 112 - 117
  • [36] Universum based kernelized weighted extreme learning machine for imbalanced datasets
    Raghuwanshi, Bhagat Singh
    Mangal, Akansha
    Shukla, Sanyam
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2022, 13 (11) : 3387 - 3408
  • [37] Learning a Distance Metric by Balancing KL-Divergence for Imbalanced Datasets
    Feng, Lin
    Wang, Huibing
    Jin, Bo
    Li, Haohao
    Xue, Mingliang
    Wang, Le
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2019, 49 (12): : 2384 - 2395
  • [38] Universum based kernelized weighted extreme learning machine for imbalanced datasets
    Bhagat Singh Raghuwanshi
    Akansha Mangal
    Sanyam Shukla
    International Journal of Machine Learning and Cybernetics, 2022, 13 : 3387 - 3408
  • [39] Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss
    Cao, Kaidi
    Wei, Colin
    Gaidon, Adrien
    Arechiga, Nikos
    Ma, Tengyu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [40] A Study on Machine Learning for Imbalanced Datasets with Answer Validation of Question Answering
    Day, Min-Yuh
    Tsai, Cheng-Chia
    PROCEEDINGS OF 2016 IEEE 17TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IEEE IRI), 2016, : 513 - 519