Imbalanced Learning in Massive Phishing Datasets

被引:3
|
作者
Azari, Ali [1 ]
Namayanja, Josephine M. [2 ]
Kaur, Navneet [1 ]
Misal, Vasundhara [1 ]
Shukla, Suraksha [1 ]
机构
[1] Univ Maryland Baltimore Cty, Dept Informat Syst, Baltimore, MD 21250 USA
[2] Univ Massachusetts, Dept Management Sci & Informat Syst, Boston, MA USA
关键词
big data; classification; ensemble learning; imbalanced learning; phishing; DECISION TREE;
D O I
10.1109/BigDataSecurity-HPSC-IDS49724.2020.00032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Phishing is one of the major threats facing internet users in today's work. Such attacks continue costing billions of dollars to companies around the words thus requiring more efficient detection techniques to curb the danger. This paper proposes a big data friendly implementation of Multiclass Imbalance Learning in Ensembles through Selective Sampling (MILES) that detects phishing attacks with high accuracy. The proposed method is compatible with SPARK, can be trained on a cluster of nodes in parallel, thus reduce the training time by increasing the size of the cluster. In addition, a comparative study of classic machine learning techniques like Random Forest, Naive Bayes, and Decision Trees show that the proposed MILES method provides significantly higher precision and recall.
引用
收藏
页码:127 / 132
页数:6
相关论文
共 50 条
  • [41] Performance of SVM with Multiple Kernel Learning for Classification Tasks of Imbalanced Datasets
    Saeed, Sana
    Ong, Hong Choon
    PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY, 2019, 27 (01): : 527 - 545
  • [42] ASE: Anomaly scoring based ensemble learning for highly imbalanced datasets
    Liang, Xiayu
    Gao, Ying
    Xu, Shanrong
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [43] An algorithm of robust online extreme learning machine for dynamic imbalanced datasets
    Zhang, Jing
    Feng, Lin
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2015, 52 (07): : 1487 - 1498
  • [44] Applications of Autonomous Learning Multi Model System to Multiclass Imbalanced Datasets
    Seabra, Andre
    Ventura, Rodrigo
    Almeida, Rui Jorge
    Vieira, Susana
    Sousa, Joao M. C.
    2024 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, FUZZ-IEEE 2024, 2024,
  • [45] FIM-Based Pairwise Selection for Active Learning on Imbalanced Datasets
    Chen, Lixing
    Tian, Xuemin
    Cai, Lianfang
    2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS, 2015, : 1876 - 1881
  • [46] Machine Learning for Imbalanced Datasets of Recognizing Inference in Text with Linguistic Phenomena
    Day, Min-Yuh
    Tsai, Cheng-Chia
    2015 IEEE 16TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2015, : 562 - 568
  • [47] A Comparison for Handling Imbalanced Datasets
    Syaripudin, Arif
    Khodra, Masayu Leylia
    2014 INTERNATIONAL CONFERENCE OF ADVANCED INFORMATICS: CONCEPT, THEORY AND APPLICATION (ICAICTA), 2014, : 293 - 297
  • [48] To improve classification of imbalanced datasets
    Shukla, Pratyusha
    Bhowmick, Kiran
    2017 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2017,
  • [49] A Study on Classifying Imbalanced Datasets
    Lakshmi, T. Jaya
    Prasad, Ch. Siva Rama
    2014 FIRST INTERNATIONAL CONFERENCE ON NETWORKS & SOFT COMPUTING (ICNSC), 2014, : 141 - 145
  • [50] Towards benchmark datasets for machine learning based website phishing detection: An experimental study
    Hannousse, Abdelhakim
    Yahiouche, Salima
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 104