Comparative Analysis of Machine Learning Algorithms for Email Phishing Detection Using TF-IDF, Word2Vec, and BERT

被引:0
|
作者
Al Tawil, Arar [1 ]
Almazaydeh, Laiali [2 ]
Qawasmeh, Doaa [3 ]
Qawasmeh, Baraah [4 ]
Alshinwan, Mohammad [1 ,5 ]
Elleithy, Khaled [6 ]
机构
[1] Appl Sci Private Univ, Fac Informat Technol, Amman 11931, Jordan
[2] Abu Dhabi Univ, Coll Engn, POB 1790, Abu Dhabi, U Arab Emirates
[3] Al Balqa Appl Univ, Fac Artificial Intelligence, Salt 19117, Jordan
[4] Western Michigan Univ, Dept Civil & Construct Engn, Kalamazoo, MI 49008 USA
[5] Middle East Univ, MEU Res Unit, Amman 11831, Jordan
[6] Univ Bridgeport, Dept Comp Sci & Engn, Bridgeport, CT 06604 USA
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2024年 / 81卷 / 02期
关键词
Attacks; email phishing; machine learning; security; representations from transformers (BERT); text classifeir; natural language processing (NLP);
D O I
10.32604/cmc.2024.057279
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cybercriminals often use fraudulent emails and fictitious email accounts to deceive individuals into disclosing confidential information, a practice known as phishing. This study utilizes three distinct methodologies, Term Frequency-Inverse Document Frequency, Word2Vec, and Bidirectional Encoder Representations from Transformers, to evaluate the effectiveness of various machine learning algorithms in detecting phishing attacks. The study uses feature extraction methods to assess the performance of Logistic Regression, Decision Tree, Random Forest, and Multilayer Perceptron algorithms. The best results for each classifier using Term Frequency-Inverse Document Frequency were Multilayer Perceptron (Precision: 0.98, Recall: 0.98, F1-score: 0.98, Accuracy: 0.98). Word2Vec's best results were Multilayer Perceptron (Precision: 0.98, Recall: 0.98, F1-score: 0.98, Accuracy: 0.98). The highest performance was achieved using the Bidirectional Encoder Representations from the Transformers model, with Precision, Recall, F1-score, and Accuracy all reaching 0.99. This study highlights how advanced pre-trained models, such as Bidirectional Encoder Representations from Transformers, can significantly enhance the accuracy and reliability of fraud detection systems.
引用
收藏
页码:3395 / 3412
页数:18
相关论文
共 50 条
  • [41] Intertextuality Detection in Literary Texts Using Word2Vec Models
    Barbu, Miruna-Stefania
    Trausan-Matu, Stefan
    2017 21ST INTERNATIONAL CONFERENCE ON SYSTEM THEORY, CONTROL AND COMPUTING (ICSTCC), 2017, : 262 - 265
  • [42] A Comparative Study on TF-IDF Feature Weighting Method and its Analysis using Unstructured Dataset
    Das, Mamata
    Kamalanathan, Selvakumar
    Alphonse, P. J. A.
    COLINS 2021: COMPUTATIONAL LINGUISTICS AND INTELLIGENT SYSTEMS, VOL I, 2021, 2870
  • [43] An Approach to Modelling User Interests Using TF-IDF and Fuzzy Sets Qualitative Comparative Analysis
    Kardaras, Dimitris K.
    Kaperonis, Stavros
    Barbounaki, Stavroula
    Petrounias, Ilias
    Bithas, Kostas
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2018, 2018, 519 : 606 - 615
  • [44] Detection of Hate Speech by Employing Support Vector Machine with Word2Vec Model
    Sevani, Nina
    Soenandi, Iwan A.
    Adianto
    Wijaya, Jeremy
    2021 7TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS AND INFORMATION ENGINEERING (ICEEIE 2021), 2021, : 615 - +
  • [45] Legal Privacy Protection Machine Learning Method Based on Word2Vec Algorithm
    Wang, Rongrong
    INTERNATIONAL JOURNAL OF INFORMATION SECURITY AND PRIVACY, 2025, 19 (01)
  • [46] Machine learning innovations in address matching: A practical comparison of word2vec and CRFs
    Comber, Sam
    Arribas-Bel, Daniel
    TRANSACTIONS IN GIS, 2019, 23 (02) : 334 - 348
  • [47] An Analysis of User Behaviors in Phishing eMail using Machine Learning Techniques
    Li, Yi
    Xiong, Kaiqi
    Li, Xiangyang
    PROCEEDINGS OF THE 16TH INTERNATIONAL JOINT CONFERENCE ON E-BUSINESS AND TELECOMMUNICATIONS, VOL 2: SECRYPT, 2019, : 529 - 534
  • [48] New Approach to shorten Feature Set via TF-IDF for Machine Learning-based Webshell Detection
    Viet Anh Phan
    Jerabek, Jan
    Dinh Khanh Le
    Gotthans, Tomas
    2024 IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND RESILIENCE, CSR, 2024, : 50 - 55
  • [49] Document Similarity Detection Using Indonesian Language Word2vec Model
    Ramadhanti, Nahda Rosa
    Mariyah, Siti
    2019 3RD INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTATIONAL SCIENCES (ICICOS 2019), 2019,
  • [50] Phishing Attacks Detection Using Ensemble Machine Learning Algorithms
    Innab, Nisreen
    Osman, Ahmed Abdelgader Fadol
    Ataelfadiel, Mohammed Awad Mohammed
    Abu-Zanona, Marwan
    Elzaghmouri, Bassam Mohammad
    Zawaideh, Farah H.
    Alawneh, Mouiad Fadeil
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 80 (01): : 1325 - 1345