Comparative Analysis of Machine Learning Algorithms for Email Phishing Detection Using TF-IDF, Word2Vec, and BERT

被引：0

作者：

Al Tawil, Arar ^{[1
]}

Almazaydeh, Laiali ^{[2
]}

Qawasmeh, Doaa ^{[3
]}

Qawasmeh, Baraah ^{[4
]}

Alshinwan, Mohammad ^{[1
,5
]}

Elleithy, Khaled ^{[6
]}

机构：

[1] Appl Sci Private Univ, Fac Informat Technol, Amman 11931, Jordan

[2] Abu Dhabi Univ, Coll Engn, POB 1790, Abu Dhabi, U Arab Emirates

[3] Al Balqa Appl Univ, Fac Artificial Intelligence, Salt 19117, Jordan

[4] Western Michigan Univ, Dept Civil & Construct Engn, Kalamazoo, MI 49008 USA

[5] Middle East Univ, MEU Res Unit, Amman 11831, Jordan

[6] Univ Bridgeport, Dept Comp Sci & Engn, Bridgeport, CT 06604 USA

来源：

CMC-COMPUTERS MATERIALS & CONTINUA | 2024年 / 81卷 / 02期

关键词：

Attacks; email phishing; machine learning; security; representations from transformers (BERT); text classifeir; natural language processing (NLP);

D O I：

10.32604/cmc.2024.057279

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Cybercriminals often use fraudulent emails and fictitious email accounts to deceive individuals into disclosing confidential information, a practice known as phishing. This study utilizes three distinct methodologies, Term Frequency-Inverse Document Frequency, Word2Vec, and Bidirectional Encoder Representations from Transformers, to evaluate the effectiveness of various machine learning algorithms in detecting phishing attacks. The study uses feature extraction methods to assess the performance of Logistic Regression, Decision Tree, Random Forest, and Multilayer Perceptron algorithms. The best results for each classifier using Term Frequency-Inverse Document Frequency were Multilayer Perceptron (Precision: 0.98, Recall: 0.98, F1-score: 0.98, Accuracy: 0.98). Word2Vec's best results were Multilayer Perceptron (Precision: 0.98, Recall: 0.98, F1-score: 0.98, Accuracy: 0.98). The highest performance was achieved using the Bidirectional Encoder Representations from the Transformers model, with Precision, Recall, F1-score, and Accuracy all reaching 0.99. This study highlights how advanced pre-trained models, such as Bidirectional Encoder Representations from Transformers, can significantly enhance the accuracy and reliability of fraud detection systems.

引用

页码：3395 / 3412

页数：18

共 50 条

[21] DeepEPhishNet: a deep learning framework for email phishing detection using word embedding algorithms
Somesha, M.
Pais, Alwyn Roshan
SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2024, 49 (03):
[22] A detection method for android application security based on TF-IDF and machine learning
Yuan, Hongli
Tang, Yongchuan
Sun, Wenjuan
Liu, Li
PLOS ONE, 2020, 15 (09):
[23] 基于Word2vec和改进型TF-IDF的卷积神经网络文本分类模型
王根生
黄学坚
小型微型计算机系统, 2019, 40 (05) : 1120 - 1126
[24] Efficient Email phishing detection using Machine learning
Abdulraheem, Rana
Odeh, Ammar
Al Fayoumi, Mustafa
Keshta, Ismail
2022 IEEE 12TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2022, : 354 - 358
[25] Comparative analysis of machine learning algorithms in detection of phishing websites
Kosan, Muhammed Ali
Yildiz, Oktay
Karacan, Hacer
PAMUKKALE UNIVERSITY JOURNAL OF ENGINEERING SCIENCES-PAMUKKALE UNIVERSITESI MUHENDISLIK BILIMLERI DERGISI, 2018, 24 (02): : 276 - 282
[26] Phishing Email Detection Using Machine Learning Techniques
Alattas, Hussain
Aljohar, Fay
Aljunibi, Hawra
Alweheibi, Muneera
Alrashdi, Rawan
Al Azman, Ghadeer
Alharby, Abdulrahman
Nagy, Naya
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2022, 22 (04): : 678 - 685
[27] Phishing Email Detection Using Machine Learning Techniques
Alammar, Meaad
Badawi, Maria Altaib
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2022, 22 (05): : 277 - 283
[28] Classification of Phishing Email Using Word Embedding and Machine Learning Techniques
Somesha M.
Pais A.R.
Journal of Cyber Security and Mobility, 2022, 11 (03): : 279 - 320
[29] Machine Learning Classification Algorithms for Phishing Detection: A Comparative Appraisal and Analysis
Gana, Noah Ndakotsu
Abdulhamid, Shafi'I Muhammad
2019 2ND INTERNATIONAL CONFERENCE OF THE IEEE NIGERIA COMPUTER CHAPTER (NIGERIACOMPUTCONF), 2019, : 19 - 26
[30] Detection of Suspicious Accounts on Twitter Using Word2Vec and Sentiment Analysis
Conde-Cespedes, Patricia
Chavando, Julie
Deberry, Eliza
MULTIMEDIA AND NETWORK INFORMATION SYSTEMS, 2019, 833 : 362 - 371

← 1 2 3 4 5 →