Comparative Analysis of Machine Learning Algorithms for Email Phishing Detection Using TF-IDF, Word2Vec, and BERT

被引:0
|
作者
Al Tawil, Arar [1 ]
Almazaydeh, Laiali [2 ]
Qawasmeh, Doaa [3 ]
Qawasmeh, Baraah [4 ]
Alshinwan, Mohammad [1 ,5 ]
Elleithy, Khaled [6 ]
机构
[1] Appl Sci Private Univ, Fac Informat Technol, Amman 11931, Jordan
[2] Abu Dhabi Univ, Coll Engn, POB 1790, Abu Dhabi, U Arab Emirates
[3] Al Balqa Appl Univ, Fac Artificial Intelligence, Salt 19117, Jordan
[4] Western Michigan Univ, Dept Civil & Construct Engn, Kalamazoo, MI 49008 USA
[5] Middle East Univ, MEU Res Unit, Amman 11831, Jordan
[6] Univ Bridgeport, Dept Comp Sci & Engn, Bridgeport, CT 06604 USA
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2024年 / 81卷 / 02期
关键词
Attacks; email phishing; machine learning; security; representations from transformers (BERT); text classifeir; natural language processing (NLP);
D O I
10.32604/cmc.2024.057279
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cybercriminals often use fraudulent emails and fictitious email accounts to deceive individuals into disclosing confidential information, a practice known as phishing. This study utilizes three distinct methodologies, Term Frequency-Inverse Document Frequency, Word2Vec, and Bidirectional Encoder Representations from Transformers, to evaluate the effectiveness of various machine learning algorithms in detecting phishing attacks. The study uses feature extraction methods to assess the performance of Logistic Regression, Decision Tree, Random Forest, and Multilayer Perceptron algorithms. The best results for each classifier using Term Frequency-Inverse Document Frequency were Multilayer Perceptron (Precision: 0.98, Recall: 0.98, F1-score: 0.98, Accuracy: 0.98). Word2Vec's best results were Multilayer Perceptron (Precision: 0.98, Recall: 0.98, F1-score: 0.98, Accuracy: 0.98). The highest performance was achieved using the Bidirectional Encoder Representations from the Transformers model, with Precision, Recall, F1-score, and Accuracy all reaching 0.99. This study highlights how advanced pre-trained models, such as Bidirectional Encoder Representations from Transformers, can significantly enhance the accuracy and reliability of fraud detection systems.
引用
收藏
页码:3395 / 3412
页数:18
相关论文
共 50 条
  • [31] Intrusion detection using a hybrid support vector machine based on entropy and TF-IDF
    Chen, Rung-Ching
    Chen, Su-Ping
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2008, 4 (02): : 413 - 424
  • [32] Are AI systems biased against the poor? A machine learning analysis using Word2Vec and GloVe embeddings
    Curto, Georgina
    Jojoa Acosta, Mario Fernando
    Comim, Flavio
    Garcia-Zapirain, Begona
    AI & SOCIETY, 2024, 39 (02) : 617 - 632
  • [33] Are AI systems biased against the poor? A machine learning analysis using Word2Vec and GloVe embeddings
    Georgina Curto
    Mario Fernando Jojoa Acosta
    Flavio Comim
    Begoña Garcia-Zapirain
    AI & SOCIETY, 2024, 39 : 617 - 632
  • [34] Detection of SMS Spam Messages Using TF-IDF Vectorizer and Deep Learning Models
    Bravo, John Adam, V
    De Goma, Joel C.
    Prudente, Springtime
    Rondilla, Robert Francis A.
    PROCEEDINGS OF THE 2024 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION TECHNOLOGY, ICIIT 2024, 2024, : 245 - 249
  • [35] A deep learning analysis on question classification task using Word2vec representations
    Yilmaz, Seyhmus
    Toklu, Sinan
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (07): : 2909 - 2928
  • [36] A deep learning analysis on question classification task using Word2vec representations
    Seyhmus Yilmaz
    Sinan Toklu
    Neural Computing and Applications, 2020, 32 : 2909 - 2928
  • [37] Comparative evaluation of machine learning algorithms for phishing site detection
    Almujahid, Noura Fahad
    Haq, Mohd Anul
    Alshehri, Mohammed
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [38] Curated Datasets and Feature Analysis for Phishing Email Detection with Machine Learning
    Champa, Arifa I.
    Rabbi, Md Fazle
    Zibran, Minhaz F.
    2024 IEEE 3RD INTERNATIONAL CONFERENCE ON COMPUTING AND MACHINE INTELLIGENCE, ICMI 2024, 2024,
  • [39] Comparative Study of Machine Learning Algorithms for Phishing Website Detection
    Omari, Kamal
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (09) : 417 - 425
  • [40] Microblog Emotional Analysis Based on TF-IWF Weighted Word2vec Model
    Tian, Hao
    Wu, Liuai
    PROCEEDINGS OF 2018 IEEE 9TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2018, : 893 - 896