Large-Scale Malicious Software Classification With Fuzzified Features and Boosted Fuzzy Random Forest

被引:7
|
作者
Li, Fang-Qi [1 ]
Wang, Shi-Lin [1 ]
Liew, Alan Wee-Chung [2 ]
Ding, Weiping [3 ]
Liu, Gong-Shen [1 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai 200240, Peoples R China
[2] Griffith Univ, Sch Informat & Commun Technol, Gold Coast, Qld 4222, Australia
[3] Nantong Univ, Sch Informat Sci & Technol, Nantong 226019, Peoples R China
基金
中国国家自然科学基金;
关键词
Malware; Feature extraction; Machine learning; Decision trees; Forestry; Support vector machines; Boosted random forest; computer security; fuzzy decision tree; malware classification; MACHINE; SYSTEM;
D O I
10.1109/TFUZZ.2020.3016023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification of malicious software, especially in a very large dataset, is a challenging task for machine intelligence. Malware can have highly diversified features, each of which has highly heterogeneous distributions. These factors increase the difficulties for traditional data analytic approaches to deal with them. Although deep learning based methods have reported good classification performance, the deep models usually lack interpretability and are fragile under adversarial attacks. To solve these problems, fuzzy systems have become a competitive candidate in malware analysis. In this article, a new fuzzy-based approach is proposed for malware classification. We focused on portable executable files in the Windows platform and analyzed the distributions of static features and content-oriented features. Fuzzification was used to reduce the ubiquitous impact of noise and outliers in a very large dataset. Finally, a novel boosted classifier consisted of fuzzy decision trees and support vector machine is proposed to perform the malware classification. By using fuzzy decision trees, the inner structure of the classifier can be readily interpreted as discriminative rules, whereas the novel boosting strategy provides state-of-the-art classification performance. Extensive experimental results showed that our method significantly outperformed several state-of-the-art classifiers.
引用
收藏
页码:3205 / 3218
页数:14
相关论文
共 50 条
  • [1] MalNet: A Large-Scale Image Database of Malicious Software
    Freitas, Scott
    Duggal, Rahul
    Chau, Duen Horng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 3948 - 3952
  • [2] Online Adaptive Kernel Learning with Random Features for Large-scale Nonlinear Classification
    Chen, Yingying
    Yang, Xiaowei
    PATTERN RECOGNITION, 2022, 131
  • [3] LARGE-SCALE RANDOM FEATURES FOR KERNEL REGRESSION
    Laparra, Valero
    Gonzalez, Diego Marcos
    Tuia, Devis
    Camps-Valls, Gustau
    2015 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2015, : 17 - 20
  • [4] Large-Scale Random Forest Language Models for Speech Recognition
    Su, Yi
    Jelinek, Frederick
    Khudanpur, Sanjeev
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 945 - 948
  • [5] Incremental Learning of Random Forests for Large-Scale Image Classification
    Ristin, Marko
    Guillaumin, Matthieu
    Gall, Juergen
    Van Gool, Luc
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (03) : 490 - 503
  • [6] Random Sampling Method of Large-Scale Graph Data Classification
    Mustafa, Rashed
    Mahmud, Mohammad Sultan
    Shadid, Mahir
    JURNAL KEJURUTERAAN, 2024, 36 (02): : 525 - 532
  • [7] Large-Scale Identification of Malicious Singleton Files
    Li, Bo
    Roundy, Kevin
    Gates, Chris
    Vorobeychik, Yevgeniy
    PROCEEDINGS OF THE SEVENTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY (CODASPY'17), 2017, : 227 - 238
  • [8] Deep Fuzzy Tree for Large-Scale Hierarchical Visual Classification
    Wang, Yu
    Hu, Qinghua
    Zhu, Pengfei
    Li, Linhao
    Lu, Bingxu
    Garibaldi, Jonathan M.
    Li, Xianling
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2020, 28 (07) : 1395 - 1406
  • [9] ON LARGE-SCALE CLASSIFICATION PROBLEMS USING FUZZY SETS.
    Stoica, M.
    Stancu-Minasian, I.M.
    Scarlat, E.
    Economic Computation and Economic Cybernetics Studies and Research, 1977, (01): : 93 - 102
  • [10] Evaluating the performance of random forest for large-scale flood discharge simulation
    Schoppa, Lukas
    Disse, Markus
    Bachmair, Sophie
    JOURNAL OF HYDROLOGY, 2020, 590