Imbalanced Learning in Massive Phishing Datasets

被引:3
|
作者
Azari, Ali [1 ]
Namayanja, Josephine M. [2 ]
Kaur, Navneet [1 ]
Misal, Vasundhara [1 ]
Shukla, Suraksha [1 ]
机构
[1] Univ Maryland Baltimore Cty, Dept Informat Syst, Baltimore, MD 21250 USA
[2] Univ Massachusetts, Dept Management Sci & Informat Syst, Boston, MA USA
关键词
big data; classification; ensemble learning; imbalanced learning; phishing; DECISION TREE;
D O I
10.1109/BigDataSecurity-HPSC-IDS49724.2020.00032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Phishing is one of the major threats facing internet users in today's work. Such attacks continue costing billions of dollars to companies around the words thus requiring more efficient detection techniques to curb the danger. This paper proposes a big data friendly implementation of Multiclass Imbalance Learning in Ensembles through Selective Sampling (MILES) that detects phishing attacks with high accuracy. The proposed method is compatible with SPARK, can be trained on a cluster of nodes in parallel, thus reduce the training time by increasing the size of the cluster. In addition, a comparative study of classic machine learning techniques like Random Forest, Naive Bayes, and Decision Trees show that the proposed MILES method provides significantly higher precision and recall.
引用
收藏
页码:127 / 132
页数:6
相关论文
共 50 条
  • [21] Exploring Data Augmentation and Active Learning Benefits in Imbalanced Datasets
    Moles, Luis
    Andres, Alain
    Echegaray, Goretti
    Boto, Fernando
    MATHEMATICS, 2024, 12 (12)
  • [22] Battering Review Spam Through Ensemble Learning in Imbalanced Datasets
    Khurshid, Faisal
    Zhu, Yan
    Hu, Jie
    Ahmad, Muqeet
    Ahmad, Mushtaq
    COMPUTER JOURNAL, 2022, 65 (07): : 1666 - 1678
  • [23] Learning rebalanced human parsing model from imbalanced datasets
    Huang, Enbo
    Su, Zhuo
    Zhou, Fan
    Wang, Ruomei
    IMAGE AND VISION COMPUTING, 2020, 99
  • [24] Certainty-based active learning for sampling imbalanced datasets
    Fu, JuiHsi
    Lee, SingLing
    NEUROCOMPUTING, 2013, 119 : 350 - 358
  • [25] Weighting Schemes for Federated Learning in Heterogeneous and Imbalanced Segmentation Datasets
    Otalora, Sebastian
    Rafael-Patino, Jonathan
    Madrona, Antoine
    Fischi-Gomez, Elda
    Ravano, Veronica
    Kober, Tobias
    Christensen, Soren
    Hakim, Arsany
    Wiest, Roland
    Richiardi, Jonas
    McKinley, Richard
    BRAINLESION: GLIOMA, MULTIPLE SCLEROSIS, STROKE AND TRAUMATIC BRAIN INJURIES, BRAINLES 2022, 2023, 13769 : 45 - 56
  • [26] Improving the Performance of Sentiment Classification on Imbalanced Datasets With Transfer Learning
    Xiao, Z.
    Wang, L.
    Du, J. Y.
    IEEE ACCESS, 2019, 7 : 28281 - 28290
  • [27] A modified adaptive synthetic sampling method for learning imbalanced datasets
    Hussein, Ahmed Saad
    Li, Tianrui
    Abd Ali, Doaa Mohsin
    Bashir, Kamal
    Yohannese, Chubato Wondaferaw
    DEVELOPMENTS OF ARTIFICIAL INTELLIGENCE TECHNOLOGIES IN COMPUTATION AND ROBOTICS, 2020, 12 : 76 - 83
  • [28] Robustness of learning techniques in handling class noise in imbalanced datasets
    Anyfamis, D.
    Karagiannopoulos, M.
    Kotsiantis, S.
    Pintelas, P.
    ARTIFICIAL INTELLIGENCE AND INNOVATIONS 2007: FROM THEORY TO APPLICATIONS, 2007, : 21 - +
  • [29] Machine Learning With Variational AutoEncoder for Imbalanced Datasets in Intrusion Detection
    Lin, Ying-Dar
    Liu, Zi-Qiang
    Hwang, Ren-Hung
    Van-Linh Nguyen
    Lin, Po-Ching
    Lai, Yuan-Cheng
    IEEE ACCESS, 2022, 10 : 15247 - 15260
  • [30] Imbalanced-learn: A Python']Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning
    Lemaitre, Guillaume
    Nogueira, Fernando
    Aridas, Christos K.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18