Detection of Algorithmically Generated Malicious Domain Names with Feature Fusion of Meaningful Word Segmentation and N-Gram Sequences

被引:3
|
作者
Chen, Shaojie [1 ]
Lang, Bo [1 ,2 ]
Chen, Yikai [1 ]
Xie, Chong [1 ]
机构
[1] Beihang Univ, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China
[2] Zhongguancun Lab, Beijing 100191, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 07期
关键词
AGD detection; meaningful word segmentation; n-gram; LSTM; feature fusion;
D O I
10.3390/app13074406
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Domain generation algorithms (DGAs) play an important role in network attacks and can be mainly divided into two types: dictionary-based and character-based. Dictionary-based algorithmically generated domains (AGDs) are similar in composition to normal domains and are harder to detect. Although methods based on meaningful word segmentation and n-gram sequence features exhibit good detection performance for AGDs, they are inadequate for mining meaningful word features of domain names, and the performance of hybrid detection of character-based and dictionary-based AGDs needs to be further improved. Therefore, in this paper, we first describe the composition of dictionary-based AGDs using meaningful word segmentation, introduce the standard deviation to better measure the word distribution features, and construct additional 11-dimensional statistical features for word segmentation results as a supplement. Then, by combining 3-gram and 1-gram sequence features, we improve the detection performance for both character-based and dictionary-based AGDs. Finally, we perform feature fusion of the above four kinds of features to achieve an end-to-end detection method for both kinds of AGDs. Experimental results showed that our method achieved an accuracy of 97.24% on the full dataset and better accuracy and F1 values than existing methods on both dictionary-based and character-based AGD datasets.
引用
收藏
页数:24
相关论文
共 7 条
  • [1] Malicious Domain Names Detection Algorithm Based on N-Gram
    Zhao, Hong
    Chang, Zhaobin
    Bao, Guangbin
    Zeng, Xiangyan
    JOURNAL OF COMPUTER NETWORKS AND COMMUNICATIONS, 2019, 2019
  • [2] Detection of algorithmically generated malicious domain names using masked N-grams
    Selvi, Jose
    Rodriguez, Ricardo J.
    Soria-Olivas, Emilio
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 124 : 156 - 163
  • [3] Algorithmically generated malicious domain names detection based on n-grams features
    Cucchiarelli, Alessandro
    Morbidoni, Christian
    Spalazzi, Luca
    Baldi, Marco
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 170
  • [4] Exploration of N-gram Features for the Domain Adaptation of Chinese Word Segmentation
    Guo, Zhen
    Zhang, Yujie
    Su, Chen
    Xu, Jinan
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, 2012, 333 : 121 - 131
  • [5] Enhancing Malicious Code Detection with Boosted N-gram Analysis and Efficient Feature Selection
    Javan, Nastooh Taheri
    Mohammadpour, Majid
    Mostafavi, Seyedakbar
    IEEE Access, 2024,
  • [6] Enhancing Malicious Code Detection With Boosted N-Gram Analysis and Efficient Feature Selection
    Javan, Nastooh Taheri
    Mohammadpour, Majid
    Mostafavi, Seyedakbar
    IEEE ACCESS, 2024, 12 : 147400 - 147421
  • [7] Boosting feature selection in a new non-adjacent N-gram for malicious code detection
    Parvin, Hamid (parvin@iust.ac.ir), 1600, CRL Publishing (22):