Malicious PDF Detection using Metadata and Structural Features

被引:0
|
作者
Smutz, Charles [1 ]
Stavrou, Angelos [1 ]
机构
[1] George Mason Univ, Ctr Secure Informat Syst, Fairfax, VA 22030 USA
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Owed to their versatile functionality and widespread adoption, PDF documents have become a popular avenue for user exploitation ranging from large-scale phishing attacks to targeted attacks. In this paper, we present a framework for robust detection of malicious documents through machine learning. Our approach is based on features extracted from document metadata and structure. Using real-world datasets, we demonstrate the the adequacy of these document properties for malware detection and the durability of these features across new malware variants. Our analysis shows that the Random Forests classification method, an ensemble classifier that randomly selects features for each individual classification tree, yields the best detection rates, even on previously unseen malware. Indeed, using multiple datasets containing an aggregate of over 5,000 unique malicious documents and over 100,000 benign ones, our classification rates remain well above 99% while maintaining low false positives of 0.2% or less for different classification parameters and experimental scenarios. Moreover, the classifier has the ability to detect documents crafted for targeted attacks and separate them from broadly distributed malicious PDF documents. Remarkably, we also discovered that by artificially reducing the influence of the top features in the classifier, we can still achieve a high rate of detection in an adversarial setting where the attacker is aware of both the top features utilized in the classifier and our normality model. Thus, the classifier is resilient against mimicry attacks even with knowledge of the document features, classification method, and training set.
引用
收藏
页码:239 / 248
页数:10
相关论文
共 50 条
  • [1] Malicious PDF Files Detection Using Structural and Java']Javascript Based Features
    Dabral, Sonal
    Agarwal, Amit
    Mahajan, Manish
    Kumar, Sachin
    INFORMATION, COMMUNICATION AND COMPUTING TECHNOLOGY, 2017, 750 : 137 - 147
  • [2] A Study of Malicious PDF Detection Technique
    Iwamoto, Mai
    Oshima, Shunsuke
    Nakashima, Takuo
    PROCEEDINGS OF 2016 10TH INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT, AND SOFTWARE INTENSIVE SYSTEMS (CISIS), 2016, : 197 - 203
  • [3] A Structural and Content-based Approach for a Precise and Robust Detection of Malicious PDF Files
    Maiorca, Davide
    Ariu, Davide
    Corona, Igino
    Giacinto, Giorgio
    2015 INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS SECURITY AND PRIVACY (ICISSP), 2015, : 27 - 36
  • [4] An Evasion Resilient Approach to the Detection of Malicious PDF Files
    Maiorca, Davide
    Ariu, Davide
    Corona, Igino
    Giacinto, Giorgio
    INFORMATION SYSTEMS SECURITY AND PRIVACY, ICISSP 2015, 2015, 576 : 68 - 85
  • [5] FEPDF: A Robust Feature Extractor for Malicious PDF Detection
    Li, Min
    Liu, Yunzheng
    Yu, Min
    Li, Gang
    Wang, Yongjian
    Liu, Chao
    2017 16TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS / 11TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING / 14TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS, 2017, : 218 - 224
  • [6] Research and Improvement of Feature Engineering for Malicious PDF Detection
    Huang N.
    He J.
    Wu Y.
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2022, 51 (05): : 766 - 773
  • [7] Malicious PDF document detection based on mixed feature
    Du X.
    Lin Y.
    Sun Y.
    Tongxin Xuebao/Journal on Communications, 2019, 40 (02): : 118 - 128
  • [8] Application of deep reinforcement learning in attacking and protecting structural features-based malicious PDF detector
    Jiang, Tian
    Liu, Yunqi
    Wu, Xuemeng
    Xu, Mohan
    Cui, Xiaohui
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2023, 141 : 325 - 338
  • [9] Double-Layer Detection Model of Malicious PDF Documents Based on Entropy Method with Multiple Features
    Song, Enzhou
    Hu, Tao
    Yi, Peng
    Wang, Wenbo
    ENTROPY, 2023, 25 (07)
  • [10] Malicious origami in PDF
    Raynal, Frederic
    Delugre, Guillaume
    Aumaitre, Damien
    JOURNAL IN COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2010, 6 (04): : 289 - 315