Weakly Supervised Deep Learning for the Detection of Domain Generation Algorithms

被引:24
|
作者
Yu, Bin [1 ,2 ]
Pan, Jie [3 ]
Gray, Daniel [3 ]
Hu, Jiaming [3 ]
Choudhary, Chhaya [3 ]
Nascimento, Anderson C. A. [3 ]
De Cock, Martine [3 ,4 ]
机构
[1] Infoblox, Santa Clara, CA 95054 USA
[2] Infoblox, Tacoma, WA 98402 USA
[3] Univ Washington, Sch Engn & Technol, Tacoma, WA 98402 USA
[4] Univ Ghent, Dept Appl Math Comp Sci & Stat, B-9000 Ghent, Belgium
来源
IEEE ACCESS | 2019年 / 7卷
关键词
Deep learning; random forest; text classification; heuristically labeled data; domain generation algorithms; cybersecurity; command and control;
D O I
10.1109/ACCESS.2019.2911522
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Domain generation algorithms (DGAs) have become commonplace in malware that seeks to establish command and control communication between an infected machine and the botmaster. DGAs dynamically and consistently generate large volumes of malicious domain names, only a few of which are registered by the botmaster, within a short time window around their generation time, and subsequently resolved when the malware on the infected machine tries to access them. Deep neural networks that can classify domain names as benign or malicious are of great interest in the real-time defense against DGAs. In contrast with traditional machine learning models, deep networks do not rely on human engineered features. Instead, they can learn features automatically from data, provided that they are supplied with sufficiently large amounts of suitable training data. Obtaining cleanly labeled ground truth data is difficult and time consuming. Heuristically labeled data could potentially provide a source of training data for weakly supervised training of DGA detectors. We propose a set of heuristics for automatically labeling domain names monitored in real traffic, and then train and evaluate classifiers with the proposed heuristically labeled dataset. We show through experiments on a dataset with 50 million domain names that such heuristically labeled data is very useful in practice to improve the predictive accuracy of deep learning-based DGA classifiers, and that these deep neural networks significantly outperform a random forest classifier with human engineered features.
引用
收藏
页码:51542 / 51556
页数:15
相关论文
共 50 条
  • [31] Weakly supervised learning of deep metrics for stereo reconstruction
    Tulyakov, Stepan
    Ivanov, Anton
    Fleuret, Francois
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1348 - 1357
  • [32] Hybrid weakly supervised learning with deep learning technique for detection of fake news from cyber propaganda
    Syed, Liyakathunisa
    Alsaeedi, Abdullah
    Alhuri, Lina A.
    Aljohani, Hutaf R.
    ARRAY, 2023, 19
  • [33] Comparative Analysis of Supervised Machine and Deep Learning Algorithms for Kyphosis Disease Detection
    Chauhan, Alok Singh
    Lilhore, Umesh Kumar
    Gupta, Amit Kumar
    Manoharan, Poongodi
    Garg, Ruchi Rani
    Hajjej, Fahima
    Keshta, Ismail
    Raahemifar, Kaamran
    APPLIED SCIENCES-BASEL, 2023, 13 (08):
  • [34] Tree-based algorithms for weakly supervised anomaly detection
    Finke, Thorben
    Hein, Marie
    Kasieczka, Gregor
    Kraemer, Michael
    Mueck, Alexander
    Prangchaikul, Parada
    Quadfasel, Tobias
    Shih, David
    Sommerhalder, Manuel
    PHYSICAL REVIEW D, 2024, 109 (03)
  • [35] Deepometry, a framework for applying supervised and weakly supervised deep learning to imaging cytometry
    Doan, Minh
    Barnes, Claire
    McQuin, Claire
    Caicedo, Juan C.
    Goodman, Allen
    Carpenter, Anne E.
    Rees, Paul
    NATURE PROTOCOLS, 2021, 16 (07) : 3572 - 3595
  • [36] Deepometry, a framework for applying supervised and weakly supervised deep learning to imaging cytometry
    Minh Doan
    Claire Barnes
    Claire McQuin
    Juan C. Caicedo
    Allen Goodman
    Anne E. Carpenter
    Paul Rees
    Nature Protocols, 2021, 16 : 3572 - 3595
  • [37] GearNet: Stepwise Dual Learning for Weakly Supervised Domain Adaptation
    Xie, Renchunzi
    Wei, Hongxin
    Feng, Lei
    An, Bo
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8717 - 8725
  • [38] WEAKLY-SUPERVISED DIAGNOSIS AND DETECTION OF BREAST CANCER USING DEEP MULTIPLE INSTANCE LEARNING
    Diogo, Pedro
    Morais, Margarida
    Calisto, Francisco Maria
    Santiago, Carlos
    Aleluia, Clara
    Nascimento, Jacinto C.
    2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,
  • [39] PSEUDO-LABEL GENERATION-EVALUATION FRAMEWORK FOR CROSS DOMAIN WEAKLY SUPERVISED OBJECT DETECTION
    Ouyang, Shengxiong
    Wang, Xinglu
    Lyu, Kejie
    Li, Yingming
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 724 - 728
  • [40] Deep reinforcement learning for data-efficient weakly supervised business process anomaly detection
    Elaziz, Eman Abd
    Fathalla, Radwa
    Shaheen, Mohamed
    JOURNAL OF BIG DATA, 2023, 10 (01)