Imbalanced Learning in Massive Phishing Datasets

被引：3

作者：

Azari, Ali ^{[1
]}

Namayanja, Josephine M. ^{[2
]}

Kaur, Navneet ^{[1
]}

Misal, Vasundhara ^{[1
]}

Shukla, Suraksha ^{[1
]}

机构：

[1] Univ Maryland Baltimore Cty, Dept Informat Syst, Baltimore, MD 21250 USA

[2] Univ Massachusetts, Dept Management Sci & Informat Syst, Boston, MA USA

来源：

2020 IEEE 6TH INT CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY) / 6TH IEEE INT CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING, (HPSC) / 5TH IEEE INT CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS) | 2020年

关键词：

big data; classification; ensemble learning; imbalanced learning; phishing; DECISION TREE;

D O I：

10.1109/BigDataSecurity-HPSC-IDS49724.2020.00032

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Phishing is one of the major threats facing internet users in today's work. Such attacks continue costing billions of dollars to companies around the words thus requiring more efficient detection techniques to curb the danger. This paper proposes a big data friendly implementation of Multiclass Imbalance Learning in Ensembles through Selective Sampling (MILES) that detects phishing attacks with high accuracy. The proposed method is compatible with SPARK, can be trained on a cluster of nodes in parallel, thus reduce the training time by increasing the size of the cluster. In addition, a comparative study of classic machine learning techniques like Random Forest, Naive Bayes, and Decision Trees show that the proposed MILES method provides significantly higher precision and recall.

引用

页码：127 / 132

页数：6

共 50 条

[21] Exploring Data Augmentation and Active Learning Benefits in Imbalanced Datasets
Moles, Luis
Andres, Alain
Echegaray, Goretti
Boto, Fernando
MATHEMATICS, 2024, 12 (12)
[22] Battering Review Spam Through Ensemble Learning in Imbalanced Datasets
Khurshid, Faisal
Zhu, Yan
Hu, Jie
Ahmad, Muqeet
Ahmad, Mushtaq
COMPUTER JOURNAL, 2022, 65 (07): : 1666 - 1678
[23] Learning rebalanced human parsing model from imbalanced datasets
Huang, Enbo
Su, Zhuo
Zhou, Fan
Wang, Ruomei
IMAGE AND VISION COMPUTING, 2020, 99
[24] Certainty-based active learning for sampling imbalanced datasets
Fu, JuiHsi
Lee, SingLing
NEUROCOMPUTING, 2013, 119 : 350 - 358
[25] Weighting Schemes for Federated Learning in Heterogeneous and Imbalanced Segmentation Datasets
Otalora, Sebastian
Rafael-Patino, Jonathan
Madrona, Antoine
Fischi-Gomez, Elda
Ravano, Veronica
Kober, Tobias
Christensen, Soren
Hakim, Arsany
Wiest, Roland
Richiardi, Jonas
McKinley, Richard
BRAINLESION: GLIOMA, MULTIPLE SCLEROSIS, STROKE AND TRAUMATIC BRAIN INJURIES, BRAINLES 2022, 2023, 13769 : 45 - 56
[26] Improving the Performance of Sentiment Classification on Imbalanced Datasets With Transfer Learning
Xiao, Z.
Wang, L.
Du, J. Y.
IEEE ACCESS, 2019, 7 : 28281 - 28290
[27] A modified adaptive synthetic sampling method for learning imbalanced datasets
Hussein, Ahmed Saad
Li, Tianrui
Abd Ali, Doaa Mohsin
Bashir, Kamal
Yohannese, Chubato Wondaferaw
DEVELOPMENTS OF ARTIFICIAL INTELLIGENCE TECHNOLOGIES IN COMPUTATION AND ROBOTICS, 2020, 12 : 76 - 83
[28] Robustness of learning techniques in handling class noise in imbalanced datasets
Anyfamis, D.
Karagiannopoulos, M.
Kotsiantis, S.
Pintelas, P.
ARTIFICIAL INTELLIGENCE AND INNOVATIONS 2007: FROM THEORY TO APPLICATIONS, 2007, : 21 - +
[29] Machine Learning With Variational AutoEncoder for Imbalanced Datasets in Intrusion Detection
Lin, Ying-Dar
Liu, Zi-Qiang
Hwang, Ren-Hung
Van-Linh Nguyen
Lin, Po-Ching
Lai, Yuan-Cheng
IEEE ACCESS, 2022, 10 : 15247 - 15260
[30] Imbalanced-learn: A Python']Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning
Lemaitre, Guillaume
Nogueira, Fernando
Aridas, Christos K.
JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18

← 1 2 3 4 5 →