Detection of phishing websites using an efficient feature-based machine learning framework

被引：117

作者：

Rao, Routhu Srinivasa ^{[1
]}

Pais, Alwyn Roshan ^{[1
]}

机构：

[1] Natl Inst Technol Karnataka, Informat Secur Res Lab, Surathkal, India

来源：

NEURAL COMPUTING & APPLICATIONS | 2019年 / 31卷 / 08期

关键词：

Cyber-attack; Phishing; Anti-phishing; Heuristic technique; Machine learning algorithms; Random Forest; Oblique Random Forest; CLASSIFICATION; ENSEMBLE; MODEL;

D O I：

10.1007/s00521-017-3305-0

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Phishing is a cyber-attack which targets naive online users tricking into revealing sensitive information such as username, password, social security number or credit card number etc. Attackers fool the Internet users by masking webpage as a trustworthy or legitimate page to retrieve personal information. There are many anti-phishing solutions such as blacklist or whitelist, heuristic and visual similarity-based methods proposed to date, but online users are still getting trapped into revealing sensitive information in phishing websites. In this paper, we propose a novel classification model, based on heuristic features that are extracted from URL, source code, and third-party services to overcome the disadvantages of existing anti-phishing techniques. Our model has been evaluated using eight different machine learning algorithms and out of which, the Random Forest (RF) algorithm performed the best with an accuracy of 99.31%. The experiments were repeated with different (orthogonal and oblique) random forest classifiers to find the best classifier for the phishing website detection. Principal component analysis Random Forest (PCA-RF) performed the best out of all oblique Random Forests (oRFs) with an accuracy of 99.55%. We have also tested our model with the third-party-based features and without third-party-based features to determine the effectiveness of third-party services in the classification of suspicious websites. We also compared our results with the baseline models (CANTINA and CANTINA+). Our proposed technique outperformed these methods and also detected zero-day phishing attacks.

引用

页码：3851 / 3873

页数：23

共 50 条

[41] A Feature-Based Approach for Sentiment Quantification Using Machine Learning
Ayyub, Kashif
Iqbal, Saqib
Nisar, Muhammad Wasif
Munir, Ehsan Ullah
Alarfaj, Fawaz Khaled
Almusallam, Naif
ELECTRONICS, 2022, 11 (06)
[42] Mushroom Classification Using Feature-Based Machine Learning Approach
Maurya, Pranjal
Singh, Nagendra Pratap
PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON COMPUTER VISION AND IMAGE PROCESSING, CVIP 2018, VOL 1, 2020, 1022 : 197 - 206
[43] A Survey of Feature Selection for Vulnerability Prediction Using Feature-based Machine Learning
Li, ZhanJun
Shao, Yan
ICMLC 2019: 2019 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, 2019, : 30 - 36
[44] Enhanced Feature Selection Using Genetic Algorithm for Machine-Learning-Based Phishing URL Detection
Kocyigit, Emre
Korkmaz, Mehmet
Sahingoz, Ozgur Koray
Diri, Banu
APPLIED SCIENCES-BASEL, 2024, 14 (14):
[45] Phishing attack detection using Machine Learning
Pandiyan S S.
Selvaraj P.
Burugari V.K.
Benadit P J.
P K.
Measurement: Sensors, 2022, 24
[46] Phishing and Smishing Detection Using Machine Learning
El Karhani, Hadi
Al Jamal, Riad
Samra, Yorgo Bou
Elhajj, Imad H.
Kayssi, Ayman
2023 IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND RESILIENCE, CSR, 2023, : 206 - 211
[47] Intrusion detection based on phishing detection with machine learning
Jayaraj R.
Pushpalatha A.
Sangeetha K.
Kamaleshwar T.
Udhaya Shree S.
Damodaran D.
Measurement: Sensors, 2024, 31
[48] A Novel Machine Learning Approach to Detect Phishing Websites
Tyagi, Ishant
Shad, Jatin
Sharma, Shubham
Gaur, Siddharth
Kaur, Gagandeep
2018 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2018, : 425 - 430
[49] HTTP header based phishing attack detection using machine learning
Shukla, Sanjeev
Misra, Manoj
Varshney, Gaurav
TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES, 2024, 35 (01)
[50] Machine Learning Based Phishing Attacks Detection Using Multiple Datasets
Aljammal, Ashraf H.
taamneh, Salah
Qawasmeh, Ahmad
Salameh, Hani Bani
International Journal of Interactive Mobile Technologies, 2023, 17 (05): : 71 - 83

← 1 2 3 4 5 →