Comparative Analysis of Machine Learning Algorithms for Email Phishing Detection Using TF-IDF, Word2Vec, and BERT

被引：0

作者：

Al Tawil, Arar ^{[1
]}

Almazaydeh, Laiali ^{[2
]}

Qawasmeh, Doaa ^{[3
]}

Qawasmeh, Baraah ^{[4
]}

Alshinwan, Mohammad ^{[1
,5
]}

Elleithy, Khaled ^{[6
]}

机构：

[1] Appl Sci Private Univ, Fac Informat Technol, Amman 11931, Jordan

[2] Abu Dhabi Univ, Coll Engn, POB 1790, Abu Dhabi, U Arab Emirates

[3] Al Balqa Appl Univ, Fac Artificial Intelligence, Salt 19117, Jordan

[4] Western Michigan Univ, Dept Civil & Construct Engn, Kalamazoo, MI 49008 USA

[5] Middle East Univ, MEU Res Unit, Amman 11831, Jordan

[6] Univ Bridgeport, Dept Comp Sci & Engn, Bridgeport, CT 06604 USA

来源：

CMC-COMPUTERS MATERIALS & CONTINUA | 2024年 / 81卷 / 02期

关键词：

Attacks; email phishing; machine learning; security; representations from transformers (BERT); text classifeir; natural language processing (NLP);

D O I：

10.32604/cmc.2024.057279

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Cybercriminals often use fraudulent emails and fictitious email accounts to deceive individuals into disclosing confidential information, a practice known as phishing. This study utilizes three distinct methodologies, Term Frequency-Inverse Document Frequency, Word2Vec, and Bidirectional Encoder Representations from Transformers, to evaluate the effectiveness of various machine learning algorithms in detecting phishing attacks. The study uses feature extraction methods to assess the performance of Logistic Regression, Decision Tree, Random Forest, and Multilayer Perceptron algorithms. The best results for each classifier using Term Frequency-Inverse Document Frequency were Multilayer Perceptron (Precision: 0.98, Recall: 0.98, F1-score: 0.98, Accuracy: 0.98). Word2Vec's best results were Multilayer Perceptron (Precision: 0.98, Recall: 0.98, F1-score: 0.98, Accuracy: 0.98). The highest performance was achieved using the Bidirectional Encoder Representations from the Transformers model, with Precision, Recall, F1-score, and Accuracy all reaching 0.99. This study highlights how advanced pre-trained models, such as Bidirectional Encoder Representations from Transformers, can significantly enhance the accuracy and reliability of fraud detection systems.

引用

页码：3395 / 3412

页数：18

共 50 条

[41] Intertextuality Detection in Literary Texts Using Word2Vec Models
Barbu, Miruna-Stefania
Trausan-Matu, Stefan
2017 21ST INTERNATIONAL CONFERENCE ON SYSTEM THEORY, CONTROL AND COMPUTING (ICSTCC), 2017, : 262 - 265
[42] A Comparative Study on TF-IDF Feature Weighting Method and its Analysis using Unstructured Dataset
Das, Mamata
Kamalanathan, Selvakumar
Alphonse, P. J. A.
COLINS 2021: COMPUTATIONAL LINGUISTICS AND INTELLIGENT SYSTEMS, VOL I, 2021, 2870
[43] An Approach to Modelling User Interests Using TF-IDF and Fuzzy Sets Qualitative Comparative Analysis
Kardaras, Dimitris K.
Kaperonis, Stavros
Barbounaki, Stavroula
Petrounias, Ilias
Bithas, Kostas
ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2018, 2018, 519 : 606 - 615
[44] Detection of Hate Speech by Employing Support Vector Machine with Word2Vec Model
Sevani, Nina
Soenandi, Iwan A.
Adianto
Wijaya, Jeremy
2021 7TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS AND INFORMATION ENGINEERING (ICEEIE 2021), 2021, : 615 - +
[45] Legal Privacy Protection Machine Learning Method Based on Word2Vec Algorithm
Wang, Rongrong
INTERNATIONAL JOURNAL OF INFORMATION SECURITY AND PRIVACY, 2025, 19 (01)
[46] Machine learning innovations in address matching: A practical comparison of word2vec and CRFs
Comber, Sam
Arribas-Bel, Daniel
TRANSACTIONS IN GIS, 2019, 23 (02) : 334 - 348
[47] An Analysis of User Behaviors in Phishing eMail using Machine Learning Techniques
Li, Yi
Xiong, Kaiqi
Li, Xiangyang
PROCEEDINGS OF THE 16TH INTERNATIONAL JOINT CONFERENCE ON E-BUSINESS AND TELECOMMUNICATIONS, VOL 2: SECRYPT, 2019, : 529 - 534
[48] New Approach to shorten Feature Set via TF-IDF for Machine Learning-based Webshell Detection
Viet Anh Phan
Jerabek, Jan
Dinh Khanh Le
Gotthans, Tomas
2024 IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND RESILIENCE, CSR, 2024, : 50 - 55
[49] Document Similarity Detection Using Indonesian Language Word2vec Model
Ramadhanti, Nahda Rosa
Mariyah, Siti
2019 3RD INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTATIONAL SCIENCES (ICICOS 2019), 2019,
[50] Phishing Attacks Detection Using Ensemble Machine Learning Algorithms
Innab, Nisreen
Osman, Ahmed Abdelgader Fadol
Ataelfadiel, Mohammed Awad Mohammed
Abu-Zanona, Marwan
Elzaghmouri, Bassam Mohammad
Zawaideh, Farah H.
Alawneh, Mouiad Fadeil
CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 80 (01): : 1325 - 1345

← 1 2 3 4 5 →