Imbalanced Learning in Massive Phishing Datasets

被引:3
|
作者
Azari, Ali [1 ]
Namayanja, Josephine M. [2 ]
Kaur, Navneet [1 ]
Misal, Vasundhara [1 ]
Shukla, Suraksha [1 ]
机构
[1] Univ Maryland Baltimore Cty, Dept Informat Syst, Baltimore, MD 21250 USA
[2] Univ Massachusetts, Dept Management Sci & Informat Syst, Boston, MA USA
关键词
big data; classification; ensemble learning; imbalanced learning; phishing; DECISION TREE;
D O I
10.1109/BigDataSecurity-HPSC-IDS49724.2020.00032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Phishing is one of the major threats facing internet users in today's work. Such attacks continue costing billions of dollars to companies around the words thus requiring more efficient detection techniques to curb the danger. This paper proposes a big data friendly implementation of Multiclass Imbalance Learning in Ensembles through Selective Sampling (MILES) that detects phishing attacks with high accuracy. The proposed method is compatible with SPARK, can be trained on a cluster of nodes in parallel, thus reduce the training time by increasing the size of the cluster. In addition, a comparative study of classic machine learning techniques like Random Forest, Naive Bayes, and Decision Trees show that the proposed MILES method provides significantly higher precision and recall.
引用
收藏
页码:127 / 132
页数:6
相关论文
共 50 条
  • [1] Active Learning for Imbalanced Datasets
    Aggarwal, Umang
    Popescu, Adrian
    Hudelot, Celine
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1417 - 1426
  • [2] LEARNING IMBALANCED DATASETS WITH MAXIMUM MARGIN LOSS
    Kang, Haeyong
    Vu, Thang
    Yoo, Chang D.
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1269 - 1273
  • [3] A Hybrid Machine Learning Methodology for Imbalanced Datasets
    Lipitakis, Anastasia-Dimitra
    Kotsiantis, Sotirios
    5TH INTERNATIONAL CONFERENCE ON INFORMATION, INTELLIGENCE, SYSTEMS AND APPLICATIONS, IISA 2014, 2014, : 252 - +
  • [4] Distribution-Sensitive Learning for Imbalanced Datasets
    Song, Yale
    Morency, Louis-Philippe
    Davis, Randall
    2013 10TH IEEE INTERNATIONAL CONFERENCE AND WORKSHOPS ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG), 2013,
  • [5] Interpretable machine learning for imbalanced credit scoring datasets
    Chen, Yujia
    Calabrese, Raffaella
    Martin-Barragan, Belen
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2024, 312 (01) : 357 - 372
  • [6] Learning imbalanced datasets based on SMOTE and Gaussian distribution
    Pan, Tingting
    Zhao, Junhong
    Wu, Wei
    Yang, Jie
    INFORMATION SCIENCES, 2020, 512 : 1214 - 1233
  • [7] Minority Class Oriented Active Learning for Imbalanced Datasets
    Aggarwal, Umang
    Popescu, Adrian
    Hudelot, Celine
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9920 - 9927
  • [8] The Proposal of Undersampling Method for Learning from Imbalanced Datasets
    Bach, Malgorzata
    Werner, Aleksandra
    Palt, Mateusz
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KES 2019), 2019, 159 : 125 - 134
  • [9] Deep Learning Applied to Imbalanced Malware Datasets Classification
    Salas, Marcelo Palma
    de Geus, Paulo Licio
    JOURNAL OF INTERNET SERVICES AND APPLICATIONS, 2024, 15 (01) : 342 - 359
  • [10] Modifying the learning rate of FLNG dealing with imbalanced datasets
    Machon-Gonzalez, Ivan
    Lopez-Garcia, Hilario
    Luis Calvo-Rolle, Jose
    2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,