Weakly Supervised Deep Learning for the Detection of Domain Generation Algorithms

被引:24
|
作者
Yu, Bin [1 ,2 ]
Pan, Jie [3 ]
Gray, Daniel [3 ]
Hu, Jiaming [3 ]
Choudhary, Chhaya [3 ]
Nascimento, Anderson C. A. [3 ]
De Cock, Martine [3 ,4 ]
机构
[1] Infoblox, Santa Clara, CA 95054 USA
[2] Infoblox, Tacoma, WA 98402 USA
[3] Univ Washington, Sch Engn & Technol, Tacoma, WA 98402 USA
[4] Univ Ghent, Dept Appl Math Comp Sci & Stat, B-9000 Ghent, Belgium
来源
IEEE ACCESS | 2019年 / 7卷
关键词
Deep learning; random forest; text classification; heuristically labeled data; domain generation algorithms; cybersecurity; command and control;
D O I
10.1109/ACCESS.2019.2911522
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Domain generation algorithms (DGAs) have become commonplace in malware that seeks to establish command and control communication between an infected machine and the botmaster. DGAs dynamically and consistently generate large volumes of malicious domain names, only a few of which are registered by the botmaster, within a short time window around their generation time, and subsequently resolved when the malware on the infected machine tries to access them. Deep neural networks that can classify domain names as benign or malicious are of great interest in the real-time defense against DGAs. In contrast with traditional machine learning models, deep networks do not rely on human engineered features. Instead, they can learn features automatically from data, provided that they are supplied with sufficiently large amounts of suitable training data. Obtaining cleanly labeled ground truth data is difficult and time consuming. Heuristically labeled data could potentially provide a source of training data for weakly supervised training of DGA detectors. We propose a set of heuristics for automatically labeling domain names monitored in real traffic, and then train and evaluate classifiers with the proposed heuristically labeled dataset. We show through experiments on a dataset with 50 million domain names that such heuristically labeled data is very useful in practice to improve the predictive accuracy of deep learning-based DGA classifiers, and that these deep neural networks significantly outperform a random forest classifier with human engineered features.
引用
收藏
页码:51542 / 51556
页数:15
相关论文
共 50 条
  • [41] Deep reinforcement learning for data-efficient weakly supervised business process anomaly detection
    Eman Abd Elaziz
    Radwa Fathalla
    Mohamed Shaheen
    Journal of Big Data, 10
  • [42] Multiple Instance Deep Learning for Weakly Supervised Small-Footprint Audio Event Detection
    Tseng, Shao-Yen
    Li, Juncheng
    Wang, Yun
    Metze, Florian
    Szurley, Joseph
    Das, Samarjit
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3279 - 3283
  • [43] Weakly Supervised Learning-based Table Detection
    Gurav A.A.
    Nene M.J.
    SN Computer Science, 2020, 1 (2)
  • [44] Weakly Supervised Object Detection Based on Active Learning
    Xiao Wang
    Xiang Xiang
    Baochang Zhang
    Xuhui Liu
    Jianying Zheng
    QingLei Hu
    Neural Processing Letters, 2022, 54 : 5169 - 5183
  • [45] Weakly supervised learning of a classifier for unusual event detection
    Jaeger, Mark
    Knoll, Christian
    Hamprecht, Fred A.
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2008, 17 (09) : 1700 - 1708
  • [46] Weakly Supervised Learning for Fake News Detection on Twitter
    Helmstetter, Stefan
    Paulheim, Heiko
    2018 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2018, : 274 - 277
  • [47] Eye landmarks detection via weakly supervised learning
    Huang, Bin
    Chen, Renwen
    Zhou, Qinbang
    Xu, Wang
    PATTERN RECOGNITION, 2020, 98
  • [48] LIGHTWEIGHT FACIAL LANDMARK DETECTION WITH WEAKLY SUPERVISED LEARNING
    Lai, Shenqi
    Liu, Lei
    Chai, Zhenhua
    Wei, Xiaolin
    2021 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2021,
  • [49] Weakly Supervised Object Detection Based on Active Learning
    Wang, Xiao
    Xiang, Xiang
    Zhang, Baochang
    Liu, Xuhui
    Zheng, Jianying
    Hu, Qinglei
    NEURAL PROCESSING LETTERS, 2022, 54 (06) : 5169 - 5183
  • [50] Detecting Domain Generation Algorithms to prevent DDoS attacks using Deep Learning
    Kumar, Subham
    Bhatia, Ashutosh
    13TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED NETWORKS AND TELECOMMUNICATION SYSTEMS (IEEE ANTS), 2019,