Comparative Analysis of Machine Learning Algorithms for Email Phishing Detection Using TF-IDF, Word2Vec, and BERT

被引:0
|
作者
Al Tawil, Arar [1 ]
Almazaydeh, Laiali [2 ]
Qawasmeh, Doaa [3 ]
Qawasmeh, Baraah [4 ]
Alshinwan, Mohammad [1 ,5 ]
Elleithy, Khaled [6 ]
机构
[1] Appl Sci Private Univ, Fac Informat Technol, Amman 11931, Jordan
[2] Abu Dhabi Univ, Coll Engn, POB 1790, Abu Dhabi, U Arab Emirates
[3] Al Balqa Appl Univ, Fac Artificial Intelligence, Salt 19117, Jordan
[4] Western Michigan Univ, Dept Civil & Construct Engn, Kalamazoo, MI 49008 USA
[5] Middle East Univ, MEU Res Unit, Amman 11831, Jordan
[6] Univ Bridgeport, Dept Comp Sci & Engn, Bridgeport, CT 06604 USA
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2024年 / 81卷 / 02期
关键词
Attacks; email phishing; machine learning; security; representations from transformers (BERT); text classifeir; natural language processing (NLP);
D O I
10.32604/cmc.2024.057279
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cybercriminals often use fraudulent emails and fictitious email accounts to deceive individuals into disclosing confidential information, a practice known as phishing. This study utilizes three distinct methodologies, Term Frequency-Inverse Document Frequency, Word2Vec, and Bidirectional Encoder Representations from Transformers, to evaluate the effectiveness of various machine learning algorithms in detecting phishing attacks. The study uses feature extraction methods to assess the performance of Logistic Regression, Decision Tree, Random Forest, and Multilayer Perceptron algorithms. The best results for each classifier using Term Frequency-Inverse Document Frequency were Multilayer Perceptron (Precision: 0.98, Recall: 0.98, F1-score: 0.98, Accuracy: 0.98). Word2Vec's best results were Multilayer Perceptron (Precision: 0.98, Recall: 0.98, F1-score: 0.98, Accuracy: 0.98). The highest performance was achieved using the Bidirectional Encoder Representations from the Transformers model, with Precision, Recall, F1-score, and Accuracy all reaching 0.99. This study highlights how advanced pre-trained models, such as Bidirectional Encoder Representations from Transformers, can significantly enhance the accuracy and reliability of fraud detection systems.
引用
收藏
页码:3395 / 3412
页数:18
相关论文
共 50 条
  • [21] DeepEPhishNet: a deep learning framework for email phishing detection using word embedding algorithms
    Somesha, M.
    Pais, Alwyn Roshan
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2024, 49 (03):
  • [22] A detection method for android application security based on TF-IDF and machine learning
    Yuan, Hongli
    Tang, Yongchuan
    Sun, Wenjuan
    Liu, Li
    PLOS ONE, 2020, 15 (09):
  • [23] 基于Word2vec和改进型TF-IDF的卷积神经网络文本分类模型
    王根生
    黄学坚
    小型微型计算机系统, 2019, 40 (05) : 1120 - 1126
  • [24] Efficient Email phishing detection using Machine learning
    Abdulraheem, Rana
    Odeh, Ammar
    Al Fayoumi, Mustafa
    Keshta, Ismail
    2022 IEEE 12TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2022, : 354 - 358
  • [25] Comparative analysis of machine learning algorithms in detection of phishing websites
    Kosan, Muhammed Ali
    Yildiz, Oktay
    Karacan, Hacer
    PAMUKKALE UNIVERSITY JOURNAL OF ENGINEERING SCIENCES-PAMUKKALE UNIVERSITESI MUHENDISLIK BILIMLERI DERGISI, 2018, 24 (02): : 276 - 282
  • [26] Phishing Email Detection Using Machine Learning Techniques
    Alattas, Hussain
    Aljohar, Fay
    Aljunibi, Hawra
    Alweheibi, Muneera
    Alrashdi, Rawan
    Al Azman, Ghadeer
    Alharby, Abdulrahman
    Nagy, Naya
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2022, 22 (04): : 678 - 685
  • [27] Phishing Email Detection Using Machine Learning Techniques
    Alammar, Meaad
    Badawi, Maria Altaib
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2022, 22 (05): : 277 - 283
  • [28] Classification of Phishing Email Using Word Embedding and Machine Learning Techniques
    Somesha M.
    Pais A.R.
    Journal of Cyber Security and Mobility, 2022, 11 (03): : 279 - 320
  • [29] Machine Learning Classification Algorithms for Phishing Detection: A Comparative Appraisal and Analysis
    Gana, Noah Ndakotsu
    Abdulhamid, Shafi'I Muhammad
    2019 2ND INTERNATIONAL CONFERENCE OF THE IEEE NIGERIA COMPUTER CHAPTER (NIGERIACOMPUTCONF), 2019, : 19 - 26
  • [30] Detection of Suspicious Accounts on Twitter Using Word2Vec and Sentiment Analysis
    Conde-Cespedes, Patricia
    Chavando, Julie
    Deberry, Eliza
    MULTIMEDIA AND NETWORK INFORMATION SYSTEMS, 2019, 833 : 362 - 371