Automated Detection of Malevolent Domains in Cyberspace Using Natural Language Processing and Machine Learning

被引：0

作者：

Samad, Saleem Raja Abdul ^{[1
]}

Ganesan, Pradeepa ^{[1
]}

Al-Kaabi, Amna Salim ^{[1
]}

Rajasekaran, Justin

Singaravelan, M. ^{[2
]}

Basha, Peerbasha Shebbeer ^{[3
]}

机构：

[1] Univ Technol & Appl Sci Ibri, Coll Comp & Informat Sci, IT Dept, Shinas, Oman

[2] Vel Tech Rangarajan Dr Sagunthala R&D Inst Sci & T, Dept Comp Sci & Engn, Chennai, Tamil Nadu, India

[3] Jamal Mohamed Coll, Dept Comp Sci, Tiruchirappalli, Tamil Nadu, India

来源：

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS | 2024年 / 15卷 / 10期

关键词：

Machine learning; N-gram; linguistic features; natural language processing (NLP); malicious webpage;

D O I：

10.14569/IJACSA.2024.0151036

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Cyberattacks are intentional attacks on computer systems, networks, and devices. Malware, phishing, drive-by downloads, and injection are popular cyberattacks that can harm individuals, businesses, and organizations. Most of these attacks trick internet users by using malicious links or webpages. Malicious webpages can be used to distribute malware, steal personal information, conduct phishing attacks, or perform other malicious activities. Detecting such malicious websites is a tedious task for internet users. Therefore, locating such a website in cyberspace requires an automated detection tool. Currently, machine learning techniques are being used to detect such malicious websites. The majority of recent studies derive limited number of features from webpages (both benign and malicious) and use machine learning (ML) algorithms to detect fraudulent webpages. However, these constrained capabilities might not use the full potential of the dataset. This study addresses this issue by identifying malicious websites using both the URL and webpage content features. To maximize detection accuracy, both ngrams and vectorization methods in natural language processing are adopted with minimum feature-set. To exploit the full potential of the dataset, the proposed approach derives the 22 common linguistic features of the URL and generates ngrams from the domain name of the URL. The textual content of the webpages was also used. The research employs seven machine learning algorithms with three vectorization methods. The outcome reveals that the proposed method outperformed the results of previous studies.

引用

页码：328 / 341

页数：14

共 50 条

[1] Automated Genre Classification of Books Using Machine Learning and Natural Language Processing
Gupta, Shikha
Agarwal, Mohit
Jain, Satbir
2019 9TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2019), 2019, : 269 - 272
[2] Discover Trending Domains using Fusion of Supervised Machine Learning with Natural Language Processing
Lakhanpal, Shilpa
Gupta, Ajay
Agrawal, Rajeev
2015 18TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2015, : 893 - 900
[3] Social Reminiscence in Older Adults' Everyday Conversations: Automated Detection Using Natural Language Processing and Machine Learning
Ferrario, Andrea
Demiray, Burcu
Yordanova, Kristina
Luo, Minxia
Martin, Mike
JOURNAL OF MEDICAL INTERNET RESEARCH, 2020, 22 (09)
[4] Network Intrusion Detection using Natural Language Processing and Ensemble Machine Learning
Das, Saikat
Ashrafuzzamant, Mohammad
Sheldon, Frederick T.
Shiva, Sajjan
2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2020, : 829 - 835
[5] Detection of Fake News Using Machine Learning and Natural Language Processing Algorithms
Prachi, Noshin Nirvana
Habibullah, Md.
Rafi, Md. Emanul Haque
Alam, Evan
Khan, Riasat
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2022, 13 (06) : 652 - 661
[6] Automated Research Review Support Using Machine Learning, Large Language Models, and Natural Language Processing
Pendyala, Vishnu S.
Kamdar, Karnavee
Mulchandani, Kapil
ELECTRONICS, 2025, 14 (02):
[7] Automated Priority Assignment of Building Maintenance Tasks Using Natural Language Processing and Machine Learning
D'Orazio, Marco
Bernardini, Gabriele
Di Giuseppe, Elisa
JOURNAL OF ARCHITECTURAL ENGINEERING, 2023, 29 (03)
[8] Stress detection using natural language processing and machine learning over social interactions
Nijhawan, Tanya
Attigeri, Girija
Ananthakrishna, T.
JOURNAL OF BIG DATA, 2022, 9 (01)
[9] Detection of Phishing in Mobile Instant Messaging using Natural Language Processing and Machine Learning
Verma, Suman
Ayala-Rivera, Vanessa
Portillo-Dominguez, A. Omar
2023 11TH INTERNATIONAL CONFERENCE IN SOFTWARE ENGINEERING RESEARCH AND INNOVATION, CONISOFT 2023, 2023, : 159 - 168
[10] Stress detection using natural language processing and machine learning over social interactions
Tanya Nijhawan
Girija Attigeri
T. Ananthakrishna
Journal of Big Data, 9

← 1 2 3 4 5 →