Comparative Analysis of Machine Learning Algorithms for Email Phishing Detection Using TF-IDF, Word2Vec, and BERT

被引：0

作者：

Al Tawil, Arar ^{[1
]}

Almazaydeh, Laiali ^{[2
]}

Qawasmeh, Doaa ^{[3
]}

Qawasmeh, Baraah ^{[4
]}

Alshinwan, Mohammad ^{[1
,5
]}

Elleithy, Khaled ^{[6
]}

机构：

[1] Appl Sci Private Univ, Fac Informat Technol, Amman 11931, Jordan

[2] Abu Dhabi Univ, Coll Engn, POB 1790, Abu Dhabi, U Arab Emirates

[3] Al Balqa Appl Univ, Fac Artificial Intelligence, Salt 19117, Jordan

[4] Western Michigan Univ, Dept Civil & Construct Engn, Kalamazoo, MI 49008 USA

[5] Middle East Univ, MEU Res Unit, Amman 11831, Jordan

[6] Univ Bridgeport, Dept Comp Sci & Engn, Bridgeport, CT 06604 USA

来源：

CMC-COMPUTERS MATERIALS & CONTINUA | 2024年 / 81卷 / 02期

关键词：

Attacks; email phishing; machine learning; security; representations from transformers (BERT); text classifeir; natural language processing (NLP);

D O I：

10.32604/cmc.2024.057279

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Cybercriminals often use fraudulent emails and fictitious email accounts to deceive individuals into disclosing confidential information, a practice known as phishing. This study utilizes three distinct methodologies, Term Frequency-Inverse Document Frequency, Word2Vec, and Bidirectional Encoder Representations from Transformers, to evaluate the effectiveness of various machine learning algorithms in detecting phishing attacks. The study uses feature extraction methods to assess the performance of Logistic Regression, Decision Tree, Random Forest, and Multilayer Perceptron algorithms. The best results for each classifier using Term Frequency-Inverse Document Frequency were Multilayer Perceptron (Precision: 0.98, Recall: 0.98, F1-score: 0.98, Accuracy: 0.98). Word2Vec's best results were Multilayer Perceptron (Precision: 0.98, Recall: 0.98, F1-score: 0.98, Accuracy: 0.98). The highest performance was achieved using the Bidirectional Encoder Representations from the Transformers model, with Precision, Recall, F1-score, and Accuracy all reaching 0.99. This study highlights how advanced pre-trained models, such as Bidirectional Encoder Representations from Transformers, can significantly enhance the accuracy and reliability of fraud detection systems.

引用

页码：3395 / 3412

页数：18

共 50 条

[31] Intrusion detection using a hybrid support vector machine based on entropy and TF-IDF
Chen, Rung-Ching
Chen, Su-Ping
INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2008, 4 (02): : 413 - 424
[32] Are AI systems biased against the poor? A machine learning analysis using Word2Vec and GloVe embeddings
Curto, Georgina
Jojoa Acosta, Mario Fernando
Comim, Flavio
Garcia-Zapirain, Begona
AI & SOCIETY, 2024, 39 (02) : 617 - 632
[33] Are AI systems biased against the poor? A machine learning analysis using Word2Vec and GloVe embeddings
Georgina Curto
Mario Fernando Jojoa Acosta
Flavio Comim
Begoña Garcia-Zapirain
AI & SOCIETY, 2024, 39 : 617 - 632
[34] Detection of SMS Spam Messages Using TF-IDF Vectorizer and Deep Learning Models
Bravo, John Adam, V
De Goma, Joel C.
Prudente, Springtime
Rondilla, Robert Francis A.
PROCEEDINGS OF THE 2024 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION TECHNOLOGY, ICIIT 2024, 2024, : 245 - 249
[35] A deep learning analysis on question classification task using Word2vec representations
Yilmaz, Seyhmus
Toklu, Sinan
NEURAL COMPUTING & APPLICATIONS, 2020, 32 (07): : 2909 - 2928
[36] A deep learning analysis on question classification task using Word2vec representations
Seyhmus Yilmaz
Sinan Toklu
Neural Computing and Applications, 2020, 32 : 2909 - 2928
[37] Comparative evaluation of machine learning algorithms for phishing site detection
Almujahid, Noura Fahad
Haq, Mohd Anul
Alshehri, Mohammed
PEERJ COMPUTER SCIENCE, 2024, 10
[38] Curated Datasets and Feature Analysis for Phishing Email Detection with Machine Learning
Champa, Arifa I.
Rabbi, Md Fazle
Zibran, Minhaz F.
2024 IEEE 3RD INTERNATIONAL CONFERENCE ON COMPUTING AND MACHINE INTELLIGENCE, ICMI 2024, 2024,
[39] Comparative Study of Machine Learning Algorithms for Phishing Website Detection
Omari, Kamal
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (09) : 417 - 425
[40] Microblog Emotional Analysis Based on TF-IWF Weighted Word2vec Model
Tian, Hao
Wu, Liuai
PROCEEDINGS OF 2018 IEEE 9TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2018, : 893 - 896

← 1 2 3 4 5 →