Weakly Supervised Deep Learning for the Detection of Domain Generation Algorithms

被引:24
|
作者
Yu, Bin [1 ,2 ]
Pan, Jie [3 ]
Gray, Daniel [3 ]
Hu, Jiaming [3 ]
Choudhary, Chhaya [3 ]
Nascimento, Anderson C. A. [3 ]
De Cock, Martine [3 ,4 ]
机构
[1] Infoblox, Santa Clara, CA 95054 USA
[2] Infoblox, Tacoma, WA 98402 USA
[3] Univ Washington, Sch Engn & Technol, Tacoma, WA 98402 USA
[4] Univ Ghent, Dept Appl Math Comp Sci & Stat, B-9000 Ghent, Belgium
来源
IEEE ACCESS | 2019年 / 7卷
关键词
Deep learning; random forest; text classification; heuristically labeled data; domain generation algorithms; cybersecurity; command and control;
D O I
10.1109/ACCESS.2019.2911522
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Domain generation algorithms (DGAs) have become commonplace in malware that seeks to establish command and control communication between an infected machine and the botmaster. DGAs dynamically and consistently generate large volumes of malicious domain names, only a few of which are registered by the botmaster, within a short time window around their generation time, and subsequently resolved when the malware on the infected machine tries to access them. Deep neural networks that can classify domain names as benign or malicious are of great interest in the real-time defense against DGAs. In contrast with traditional machine learning models, deep networks do not rely on human engineered features. Instead, they can learn features automatically from data, provided that they are supplied with sufficiently large amounts of suitable training data. Obtaining cleanly labeled ground truth data is difficult and time consuming. Heuristically labeled data could potentially provide a source of training data for weakly supervised training of DGA detectors. We propose a set of heuristics for automatically labeling domain names monitored in real traffic, and then train and evaluate classifiers with the proposed heuristically labeled dataset. We show through experiments on a dataset with 50 million domain names that such heuristically labeled data is very useful in practice to improve the predictive accuracy of deep learning-based DGA classifiers, and that these deep neural networks significantly outperform a random forest classifier with human engineered features.
引用
收藏
页码:51542 / 51556
页数:15
相关论文
共 50 条
  • [1] Weakly Supervised Domain Detection
    Xu, Yumo
    Lapata, Mirella
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2019, 7 : 581 - 596
  • [2] Multiple instance learning on deep features for weakly supervised object detection with extreme domain shifts
    Nicolas, Gonthier
    Ladjal, Said
    Gousseau, Yann
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 214
  • [3] Self Paced Deep Learning for Weakly Supervised Object Detection
    Sangineto, Enver
    Nabi, Moin
    Culibrk, Dubravko
    Sebe, Nicu
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (03) : 712 - 725
  • [4] Learning deep structured network for weakly supervised change detection
    Khan, Salman
    He, Xuming
    Porikli, Fatih
    Bennamoun, Mohammed
    Sohel, Ferdous
    Togneri, Roberto
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2008 - 2015
  • [5] Weakly-Supervised Deep Learning for Domain Invariant Sentiment Classification
    Kayal, Pratik
    Singh, Mayank
    Goyal, Pawan
    PROCEEDINGS OF THE 7TH ACM IKDD CODS AND 25TH COMAD (CODS-COMAD 2020), 2020, : 239 - 243
  • [6] Weakly supervised foreground learning for weakly supervised localization and detection
    Zhang, Chen -Lin
    Li, Yin
    Wu, Jianxin
    PATTERN RECOGNITION, 2023, 137
  • [7] Learning Cascaded Detection Tasks with Weakly-Supervised Domain Adaptation
    Hanselmann, Niklas
    Schneider, Nick
    Ortelt, Benedikt
    Geiger, Andreas
    2021 32ND IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2021, : 532 - 539
  • [8] Weakly Supervised Deep Learning in Radiology
    Misera, Leo
    Mueller-Franzes, Gustav
    Truhn, Daniel
    Kather, Jakob Nikolas
    RADIOLOGY, 2024, 312 (01)
  • [9] Weakly Supervised Deep Detection Networks
    Bilen, Hakan
    Vedaldi, Andrea
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2846 - 2854
  • [10] A WEAKLY SUPERVISED DEEP LEARNING APPROACH FOR PLANT CENTER DETECTION AND COUNTING
    Karami, Azam
    Crawford, Melba
    Delp, Edward J.
    IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 1584 - 1587