Detection of Algorithmically Generated Malicious Domain Names with Feature Fusion of Meaningful Word Segmentation and N-Gram Sequences

被引：3

作者：

Chen, Shaojie ^{[1
]}

Lang, Bo ^{[1
,2
]}

Chen, Yikai ^{[1
]}

Xie, Chong ^{[1
]}

机构：

[1] Beihang Univ, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China

[2] Zhongguancun Lab, Beijing 100191, Peoples R China

来源：

APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 07期

关键词：

AGD detection; meaningful word segmentation; n-gram; LSTM; feature fusion;

D O I：

10.3390/app13074406

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Domain generation algorithms (DGAs) play an important role in network attacks and can be mainly divided into two types: dictionary-based and character-based. Dictionary-based algorithmically generated domains (AGDs) are similar in composition to normal domains and are harder to detect. Although methods based on meaningful word segmentation and n-gram sequence features exhibit good detection performance for AGDs, they are inadequate for mining meaningful word features of domain names, and the performance of hybrid detection of character-based and dictionary-based AGDs needs to be further improved. Therefore, in this paper, we first describe the composition of dictionary-based AGDs using meaningful word segmentation, introduce the standard deviation to better measure the word distribution features, and construct additional 11-dimensional statistical features for word segmentation results as a supplement. Then, by combining 3-gram and 1-gram sequence features, we improve the detection performance for both character-based and dictionary-based AGDs. Finally, we perform feature fusion of the above four kinds of features to achieve an end-to-end detection method for both kinds of AGDs. Experimental results showed that our method achieved an accuracy of 97.24% on the full dataset and better accuracy and F1 values than existing methods on both dictionary-based and character-based AGD datasets.

引用

页数：24

共 7 条

[1] Malicious Domain Names Detection Algorithm Based on N-Gram
Zhao, Hong
Chang, Zhaobin
Bao, Guangbin
Zeng, Xiangyan
JOURNAL OF COMPUTER NETWORKS AND COMMUNICATIONS, 2019, 2019
[2] Detection of algorithmically generated malicious domain names using masked N-grams
Selvi, Jose
Rodriguez, Ricardo J.
Soria-Olivas, Emilio
EXPERT SYSTEMS WITH APPLICATIONS, 2019, 124 : 156 - 163
[3] Algorithmically generated malicious domain names detection based on n-grams features
Cucchiarelli, Alessandro
Morbidoni, Christian
Spalazzi, Luca
Baldi, Marco
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 170
[4] Exploration of N-gram Features for the Domain Adaptation of Chinese Word Segmentation
Guo, Zhen
Zhang, Yujie
Su, Chen
Xu, Jinan
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, 2012, 333 : 121 - 131
[5] Enhancing Malicious Code Detection with Boosted N-gram Analysis and Efficient Feature Selection
Javan, Nastooh Taheri
Mohammadpour, Majid
Mostafavi, Seyedakbar
IEEE Access, 2024,
[6] Enhancing Malicious Code Detection With Boosted N-Gram Analysis and Efficient Feature Selection
Javan, Nastooh Taheri
Mohammadpour, Majid
Mostafavi, Seyedakbar
IEEE ACCESS, 2024, 12 : 147400 - 147421
[7] Boosting feature selection in a new non-adjacent N-gram for malicious code detection
Parvin, Hamid (parvin@iust.ac.ir), 1600, CRL Publishing (22):

← 1 →