Weakly Supervised Deep Learning for the Detection of Domain Generation Algorithms

被引:24
|
作者
Yu, Bin [1 ,2 ]
Pan, Jie [3 ]
Gray, Daniel [3 ]
Hu, Jiaming [3 ]
Choudhary, Chhaya [3 ]
Nascimento, Anderson C. A. [3 ]
De Cock, Martine [3 ,4 ]
机构
[1] Infoblox, Santa Clara, CA 95054 USA
[2] Infoblox, Tacoma, WA 98402 USA
[3] Univ Washington, Sch Engn & Technol, Tacoma, WA 98402 USA
[4] Univ Ghent, Dept Appl Math Comp Sci & Stat, B-9000 Ghent, Belgium
来源
IEEE ACCESS | 2019年 / 7卷
关键词
Deep learning; random forest; text classification; heuristically labeled data; domain generation algorithms; cybersecurity; command and control;
D O I
10.1109/ACCESS.2019.2911522
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Domain generation algorithms (DGAs) have become commonplace in malware that seeks to establish command and control communication between an infected machine and the botmaster. DGAs dynamically and consistently generate large volumes of malicious domain names, only a few of which are registered by the botmaster, within a short time window around their generation time, and subsequently resolved when the malware on the infected machine tries to access them. Deep neural networks that can classify domain names as benign or malicious are of great interest in the real-time defense against DGAs. In contrast with traditional machine learning models, deep networks do not rely on human engineered features. Instead, they can learn features automatically from data, provided that they are supplied with sufficiently large amounts of suitable training data. Obtaining cleanly labeled ground truth data is difficult and time consuming. Heuristically labeled data could potentially provide a source of training data for weakly supervised training of DGA detectors. We propose a set of heuristics for automatically labeling domain names monitored in real traffic, and then train and evaluate classifiers with the proposed heuristically labeled dataset. We show through experiments on a dataset with 50 million domain names that such heuristically labeled data is very useful in practice to improve the predictive accuracy of deep learning-based DGA classifiers, and that these deep neural networks significantly outperform a random forest classifier with human engineered features.
引用
收藏
页码:51542 / 51556
页数:15
相关论文
共 50 条
  • [21] Weakly Supervised Deep Learning Method for Vulnerable Road User Detection in FMCW Radar
    Dimitrievski, Martin
    Shopovska, Ivana
    Van Hamme, David
    Veelaert, Peter
    Philips, Wilfried
    2020 IEEE 23RD INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2020,
  • [22] Transferring Deep Models for Cloud Detection in Multisensor Images via Weakly Supervised Learning
    Zhu, Shaocong
    Li, Zhiwei
    Shen, Huanfeng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 18
  • [23] Generalizable Beat-by-Beat Arrhythmia Detection by Using Weakly Supervised Deep Learning
    Liu, Yang
    Li, Qince
    He, Runnan
    Wang, Kuanquan
    Liu, Jun
    Yuan, Yongfeng
    Xia, Yong
    Zhang, Henggui
    FRONTIERS IN PHYSIOLOGY, 2022, 13
  • [24] Deep Weakly Supervised Domain Adaptation for Pain Localization in Videos
    Praveen, Gnana R.
    Granger, Eric
    Cardinal, Patrick
    2020 15TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2020), 2020, : 473 - 480
  • [25] A Weakly Supervised Deep Learning Semantic Segmentation Framework
    Zhang, Jizhi
    Zhang, Guoying
    Wang, Qiangyu
    Bai, Shuang
    2017 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD), 2017, : 182 - 185
  • [26] Domain Generation Algorithms detection through deep neural network and ensemble
    Li, Shuaiji
    Huang, Tao
    Qin, Zhiwei
    Zhang, Fanfang
    Chang, Yinhong
    COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2019 ), 2019, : 189 - 196
  • [27] Weakly Supervised Semantic Segmentation Based on Deep Learning
    Liang, Binxiu
    Liu, Yan
    He, Linxi
    Li, Jiangyun
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON MODELLING, IDENTIFICATION AND CONTROL (ICMIC2019), 2020, 582 : 455 - 464
  • [28] Weakly Supervised Instance Segmentation by Deep Community Learning
    Hwang, Jaedong
    Kim, Seohyun
    Son, Jeany
    Han, Bohyung
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1019 - 1028
  • [29] Weakly Supervised Deep Learning Approach in Streaming Environments
    Pratama, Mahardhika
    Ashfahani, Andri
    Hady, Abdul
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 1195 - 1202
  • [30] Weakly Supervised Deep Metric Learning for Template Matching
    Buniatyan, Davit
    Popovych, Sergiy
    Ih, Dodam
    Macrina, Thomas
    Zung, Jonathan
    Seung, H. Sebastian
    ADVANCES IN COMPUTER VISION, CVC, VOL 1, 2020, 943 : 39 - 58