A Universal Malicious Documents Static Detection Framework Based on Feature Generalization

被引:9
|
作者
Lu, Xiaofeng [1 ]
Wang, Fei [1 ]
Jiang, Cheng [1 ]
Lio, Pietro [2 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Cyberspace Secur, Beijing 100876, Peoples R China
[2] Univ Cambridge, Comp Lab, Cambridge CB3 0FD, England
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 24期
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
malicious document detection; static detection; feature generalization; machine learning;
D O I
10.3390/app112412134
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In this study, Portable Document Format (PDF), Word, Excel, Rich Test format (RTF) and image documents are taken as the research objects to study a static and fast method by which to detect malicious documents. Malicious PDF and Word document features are abstracted and extended, which can be used to detect other types of documents. A universal static detection framework for malicious documents based on feature generalization is then proposed. The generalized features include specification check errors, the structure path, code keywords, and the number of objects. The proposed method is verified on two datasets, and is compared with Kaspersky, NOD32, and McAfee antivirus software. The experimental results demonstrate that the proposed method achieves good performance in terms of the detection accuracy, runtime, and scalability. The average F1-score of all types of documents is found to be 0.99, and the average detection time of a document is 0.5926 s, which is at the same level as the compared antivirus software.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] UFADF: A Unified Feature Analysis and Detection Framework for Malicious Office Documents
    Hu, Yang
    Chen, Jia
    Luo, Xin
    2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023, 2024, : 881 - 888
  • [2] A Malicious Code Static Detection Framework Based on Multi-Feature Ensemble Learning
    Yang W.
    Gao M.
    Jiang T.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2021, 58 (05): : 1021 - 1034
  • [3] Static detection of malicious JavaScript-bearing PDF documents
    Laskov, Pavel
    Šrndić, Nedim
    ACM International Conference Proceeding Series, 2011, : 373 - 382
  • [4] A STATIC DETECTION MODEL OF MALICIOUS PDF DOCUMENTS BASED ON NAIVE BAYESIAN CLASSIFIER TECHNOLOGY
    Cheng, Huang
    Yong, Fang
    Liang, Liu
    Wang, Lu-Rong
    2012 INTERNATIONAL CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (LCWAMTIP), 2012, : 29 - 32
  • [5] The De-Obfuscation Method in the Static Detection of Malicious PDF Documents
    Wang, Yuntao
    Proceedings - 2021 7th Annual International Conference on Network and Information Systems for Computers, ICNISC 2021, 2021, : 44 - 47
  • [6] Static Detection of Malicious Java']JavaScript-Bearing PDF Documents
    Laskov, Pavel
    Srndic, Nedim
    27TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE (ACSAC 2011), 2011, : 373 - 382
  • [7] Feature Selection Framework for Optimizing ML-based Malicious URL Detection
    Shah, Sajjad H.
    Garu, Amit
    Nguyen, Duong N.
    Borowczak, Mike
    2024 CYBER AWARENESS AND RESEARCH SYMPOSIUM, CARS 2024, 2024,
  • [8] Feature representation and selection in malicious code detection methods based on static system calls
    Ding Yuxin
    Yuan Xuebing
    Zhou Di
    Dong Li
    An Zhanchao
    COMPUTERS & SECURITY, 2011, 30 (6-7) : 514 - 524
  • [9] An efficient malicious webpage static detection framework based on optimized Bayesian and hybrid machine learning
    Yang, Fan
    Zhu, Chaoqun
    Xu, Heng
    Qian, Yongfeng
    Song, Jun
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (10):
  • [10] Static detection of malicious PowerShell based on word embeddings
    Mimura, Mamoru
    Tajiri, Yui
    INTERNET OF THINGS, 2021, 15