Malicious PDF Detection using Metadata and Structural Features

被引:0
|
作者
Smutz, Charles [1 ]
Stavrou, Angelos [1 ]
机构
[1] George Mason Univ, Ctr Secure Informat Syst, Fairfax, VA 22030 USA
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Owed to their versatile functionality and widespread adoption, PDF documents have become a popular avenue for user exploitation ranging from large-scale phishing attacks to targeted attacks. In this paper, we present a framework for robust detection of malicious documents through machine learning. Our approach is based on features extracted from document metadata and structure. Using real-world datasets, we demonstrate the the adequacy of these document properties for malware detection and the durability of these features across new malware variants. Our analysis shows that the Random Forests classification method, an ensemble classifier that randomly selects features for each individual classification tree, yields the best detection rates, even on previously unseen malware. Indeed, using multiple datasets containing an aggregate of over 5,000 unique malicious documents and over 100,000 benign ones, our classification rates remain well above 99% while maintaining low false positives of 0.2% or less for different classification parameters and experimental scenarios. Moreover, the classifier has the ability to detect documents crafted for targeted attacks and separate them from broadly distributed malicious PDF documents. Remarkably, we also discovered that by artificially reducing the influence of the top features in the classifier, we can still achieve a high rate of detection in an adversarial setting where the attacker is aware of both the top features utilized in the classifier and our normality model. Thus, the classifier is resilient against mimicry attacks even with knowledge of the document features, classification method, and training set.
引用
收藏
页码:239 / 248
页数:10
相关论文
共 50 条
  • [41] DETECTING MALICIOUS PDF DOCUMENTS USING SEMI-SUPERVISED MACHINE LEARNING
    Jiang, Jianguo
    Song, Nan
    Yu, Min
    Chow, Kam-Pui
    Li, Gang
    Liu, Chao
    Huang, Weiqing
    ADVANCES IN DIGITAL FORENSICS XVII, 2021, 612 : 135 - 155
  • [42] A practical approach on clustering malicious PDF documents
    Vatamanu, Cristina
    Gavrilut¸, Dragos¸
    Benchea, Razvan
    Journal in Computer Virology, 2012, 8 (04): : 151 - 163
  • [43] Automated detection of glaucoma using structural and non structural features
    Salam, Anum A.
    Khalil, Tehmina
    Akram, M. Usman
    Jameel, Amina
    Basit, Imran
    SPRINGERPLUS, 2016, 5
  • [44] Automatic Detection of Various Malicious Traffic Using Side Channel Features on TCP Packets
    Stergiopoulos, George
    Talavari, Alexander
    Bitsikas, Evangelos
    Gritzalis, Dimitris
    COMPUTER SECURITY (ESORICS 2018), PT I, 2018, 11098 : 346 - 362
  • [45] Malicious UAV Detection Using Integrated Audio and Visual Features for Public Safety Applications
    Jamil, Sonain
    Fawad
    Rahman, MuhibUr
    Ullah, Amin
    Badnava, Salman
    Forsat, Masoud
    Mirjavadi, Seyed Sajad
    SENSORS, 2020, 20 (14) : 1 - 16
  • [46] Identifying Generic Features for Malicious URL Detection System
    Khan, Hafiz Mohammd Junaid
    Niyaz, Quamar
    Devabhaktuni, Vijay K.
    Guo, Site
    Shaikh, Umair
    2019 IEEE 10TH ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2019, : 347 - 352
  • [47] Malicious Code Detection Based on Code Semantic Features
    Zhang, Yu
    Li, Binglong
    IEEE ACCESS, 2020, 8 : 176728 - 176737
  • [48] Malicious Android Application Detection Based on Composite Features
    Xiao, Jingxu
    Xu, Kaiyong
    Duan, Jialiang
    PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2019), 2019,
  • [49] Malicious detection based on reliefF and boosting multidimensional features
    Luo, Yang Xia
    Journal of Communications, 2015, 10 (11): : 910 - 917
  • [50] Detection of Malicious Executables Using Static and Dynamic Features of Portable Executable (PE) File
    Awan, Saba
    Saqib, Nazar Abbas
    SECURITY, PRIVACY AND ANONYMITY IN COMPUTATION, COMMUNICATION AND STORAGE, (SPACCS 2016), 2016, 0067 : 48 - 58