Malicious PDF Detection using Metadata and Structural Features

被引:0
|
作者
Smutz, Charles [1 ]
Stavrou, Angelos [1 ]
机构
[1] George Mason Univ, Ctr Secure Informat Syst, Fairfax, VA 22030 USA
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Owed to their versatile functionality and widespread adoption, PDF documents have become a popular avenue for user exploitation ranging from large-scale phishing attacks to targeted attacks. In this paper, we present a framework for robust detection of malicious documents through machine learning. Our approach is based on features extracted from document metadata and structure. Using real-world datasets, we demonstrate the the adequacy of these document properties for malware detection and the durability of these features across new malware variants. Our analysis shows that the Random Forests classification method, an ensemble classifier that randomly selects features for each individual classification tree, yields the best detection rates, even on previously unseen malware. Indeed, using multiple datasets containing an aggregate of over 5,000 unique malicious documents and over 100,000 benign ones, our classification rates remain well above 99% while maintaining low false positives of 0.2% or less for different classification parameters and experimental scenarios. Moreover, the classifier has the ability to detect documents crafted for targeted attacks and separate them from broadly distributed malicious PDF documents. Remarkably, we also discovered that by artificially reducing the influence of the top features in the classifier, we can still achieve a high rate of detection in an adversarial setting where the attacker is aware of both the top features utilized in the classifier and our normality model. Thus, the classifier is resilient against mimicry attacks even with knowledge of the document features, classification method, and training set.
引用
收藏
页码:239 / 248
页数:10
相关论文
共 50 条
  • [31] Structural Analysis of URL For Malicious URL Detection Using Machine Learning
    Raja, A. Saleem
    Peerbasha, S.
    Iqbal, Y. Mohammed
    Sundarvadivazhagan, B.
    Surputheen, M. Mohamed
    JOURNAL OF ADVANCED APPLIED SCIENTIFIC RESEARCH, 2023, 5 (04): : 28 - 41
  • [32] Malicious URI resolving in PDF documents
    Hamon, Valentin
    JOURNAL IN COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2013, 9 (02): : 65 - 76
  • [33] Detection of Malicious PDF Files Using a Two-Stage Machine Learning AlgorithmInspec keywordsOther keywordsKey words
    He, Kang
    Zhu, Yuefei
    He, Yubo
    Liu, Long
    Lu, Bin
    Lin, Wei
    CHINESE JOURNAL OF ELECTRONICS, 2020, 29 (06) : 1165 - 1177
  • [34] Features combination for the detection of malicious Twitter accounts
    David, Isaac
    Siordia, Oscar S.
    Moctezuma, Daniela
    2016 IEEE INTERNATIONAL AUTUMN MEETING ON POWER, ELECTRONICS AND COMPUTING (ROPEC), 2016,
  • [35] Malicious Java']JavaScript Detection by Features Extraction
    Canfora, Gerardo
    Mercaldo, Francesco
    Visaggio, Corrado Aaron
    E-INFORMATICA SOFTWARE ENGINEERING JOURNAL, 2014, 8 (01) : 65 - 78
  • [36] Lexical features based malicious URL detection using machine learning techniques
    Saleem Raja, A.
    Vinodini, R.
    Kavitha, A.
    MATERIALS TODAY-PROCEEDINGS, 2021, 47 : 163 - 166
  • [37] A STATIC DETECTION MODEL OF MALICIOUS PDF DOCUMENTS BASED ON NAIVE BAYESIAN CLASSIFIER TECHNOLOGY
    Cheng, Huang
    Yong, Fang
    Liang, Liu
    Wang, Lu-Rong
    2012 INTERNATIONAL CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (LCWAMTIP), 2012, : 29 - 32
  • [38] Detecting Malicious PDF Files Using Semi-Supervised Learning Method
    Feng, Di
    Yu, Min
    Wang, Yongjian
    Liu, Chao
    Ma, Chunguang
    5TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE APPLICATIONS AND TECHNOLOGIES (ACSAT 2017), 2017, : 1 - 9
  • [39] A practical approach on clustering malicious PDF documents
    Cristina Vatamanu
    Dragoş Gavriluţ
    Răzvan Benchea
    Journal in Computer Virology, 2012, 8 (4): : 151 - 163
  • [40] A practical approach on clustering malicious PDF documents
    Vatamanu, Cristina
    Gavrilut, Dragos
    Benchea, Razvan
    JOURNAL IN COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2012, 8 (04): : 151 - 163