Comparative Analysis of Machine Learning Algorithms for Email Phishing Detection Using TF-IDF, Word2Vec, and BERT

被引:0
|
作者
Al Tawil, Arar [1 ]
Almazaydeh, Laiali [2 ]
Qawasmeh, Doaa [3 ]
Qawasmeh, Baraah [4 ]
Alshinwan, Mohammad [1 ,5 ]
Elleithy, Khaled [6 ]
机构
[1] Appl Sci Private Univ, Fac Informat Technol, Amman 11931, Jordan
[2] Abu Dhabi Univ, Coll Engn, POB 1790, Abu Dhabi, U Arab Emirates
[3] Al Balqa Appl Univ, Fac Artificial Intelligence, Salt 19117, Jordan
[4] Western Michigan Univ, Dept Civil & Construct Engn, Kalamazoo, MI 49008 USA
[5] Middle East Univ, MEU Res Unit, Amman 11831, Jordan
[6] Univ Bridgeport, Dept Comp Sci & Engn, Bridgeport, CT 06604 USA
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2024年 / 81卷 / 02期
关键词
Attacks; email phishing; machine learning; security; representations from transformers (BERT); text classifeir; natural language processing (NLP);
D O I
10.32604/cmc.2024.057279
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cybercriminals often use fraudulent emails and fictitious email accounts to deceive individuals into disclosing confidential information, a practice known as phishing. This study utilizes three distinct methodologies, Term Frequency-Inverse Document Frequency, Word2Vec, and Bidirectional Encoder Representations from Transformers, to evaluate the effectiveness of various machine learning algorithms in detecting phishing attacks. The study uses feature extraction methods to assess the performance of Logistic Regression, Decision Tree, Random Forest, and Multilayer Perceptron algorithms. The best results for each classifier using Term Frequency-Inverse Document Frequency were Multilayer Perceptron (Precision: 0.98, Recall: 0.98, F1-score: 0.98, Accuracy: 0.98). Word2Vec's best results were Multilayer Perceptron (Precision: 0.98, Recall: 0.98, F1-score: 0.98, Accuracy: 0.98). The highest performance was achieved using the Bidirectional Encoder Representations from the Transformers model, with Precision, Recall, F1-score, and Accuracy all reaching 0.99. This study highlights how advanced pre-trained models, such as Bidirectional Encoder Representations from Transformers, can significantly enhance the accuracy and reliability of fraud detection systems.
引用
收藏
页码:3395 / 3412
页数:18
相关论文
共 50 条
  • [1] A study of damp-heat syndrome classification Using Word2vec and TF-IDF
    Zhu, Wei
    Zhang, Wei
    Li, Guo-Zheng
    He, Chong
    Zhang, Lei
    2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 1415 - 1420
  • [2] 基于TF-IDF与Word2vec的新闻热点分析
    王婧
    中国有线电视, 2023, (02) : 59 - 63
  • [3] Text classification algorithm of tourist attractions subcategories with modified TF-IDF and Word2Vec
    Xiao, Lu
    Li, Qiaoxing
    Ma, Qian
    Shen, Jiasheng
    Yang, Yong
    Li, Danyang
    PLOS ONE, 2024, 19 (10):
  • [4] 基于TF-IDF与Word2vec的用户评论分析研究
    刘宇韬
    施莉
    刘诗含
    成都航空职业技术学院学报, 2022, 38 (04) : 89 - 92
  • [5] 基于TF-IDF与word2vec的台词文本分类研究
    但宇豪
    黄继风
    杨琳
    高海
    上海师范大学学报(自然科学版), 2020, 49 (自然科学版) : 89 - 95
  • [6] 基于TF-IDF与word2vec的台词文本分类研究
    但宇豪
    黄继风
    杨琳
    高海
    上海师范大学学报(自然科学版), 2020, 49 (01) : 89 - 95
  • [7] Question classification based on Bloom's taxonomy cognitive domain using modified TF-IDF and word2vec
    Mohammed, Manal
    Omar, Nazlia
    PLOS ONE, 2020, 15 (03):
  • [8] 基于Word2vec和改进TF-IDF算法的深度学习模型研究
    石琳
    徐瑞龙
    计算机与数字工程, 2021, 49 (05) : 966 - 970
  • [9] 基于TF-IDF和word2Vec的中文文本自动摘要模型
    龚永罡
    郭远南
    中国新通信, 2023, 25 (02) : 65 - 67
  • [10] TF-IDF和Word2vec在新闻文本分类中的比较研究
    王丽
    肖小玲
    张乐乐
    电脑知识与技术, 2020, 16 (29) : 220 - 222