Large-Scale Malicious Software Classification With Fuzzified Features and Boosted Fuzzy Random Forest

被引:7
|
作者
Li, Fang-Qi [1 ]
Wang, Shi-Lin [1 ]
Liew, Alan Wee-Chung [2 ]
Ding, Weiping [3 ]
Liu, Gong-Shen [1 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai 200240, Peoples R China
[2] Griffith Univ, Sch Informat & Commun Technol, Gold Coast, Qld 4222, Australia
[3] Nantong Univ, Sch Informat Sci & Technol, Nantong 226019, Peoples R China
基金
中国国家自然科学基金;
关键词
Malware; Feature extraction; Machine learning; Decision trees; Forestry; Support vector machines; Boosted random forest; computer security; fuzzy decision tree; malware classification; MACHINE; SYSTEM;
D O I
10.1109/TFUZZ.2020.3016023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification of malicious software, especially in a very large dataset, is a challenging task for machine intelligence. Malware can have highly diversified features, each of which has highly heterogeneous distributions. These factors increase the difficulties for traditional data analytic approaches to deal with them. Although deep learning based methods have reported good classification performance, the deep models usually lack interpretability and are fragile under adversarial attacks. To solve these problems, fuzzy systems have become a competitive candidate in malware analysis. In this article, a new fuzzy-based approach is proposed for malware classification. We focused on portable executable files in the Windows platform and analyzed the distributions of static features and content-oriented features. Fuzzification was used to reduce the ubiquitous impact of noise and outliers in a very large dataset. Finally, a novel boosted classifier consisted of fuzzy decision trees and support vector machine is proposed to perform the malware classification. By using fuzzy decision trees, the inner structure of the classifier can be readily interpreted as discriminative rules, whereas the novel boosting strategy provides state-of-the-art classification performance. Extensive experimental results showed that our method significantly outperformed several state-of-the-art classifiers.
引用
收藏
页码:3205 / 3218
页数:14
相关论文
共 50 条
  • [41] Ladle Furnace Temperature Prediction Model Based on Large-scale Data With Random Forest
    Wang, Xiaojun
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2017, 4 (04) : 770 - 774
  • [42] Ladle Furnace Temperature Prediction Model Based on Large-scale Data With Random Forest
    Xiaojun Wang
    IEEE/CAAJournalofAutomaticaSinica, 2017, 4 (04) : 770 - 774
  • [43] Large-scale multivariate forecasting models for Dengue - LSTM versus random forest regression
    Mussumeci, Elisa
    Coelho, Flavio Codeco
    SPATIAL AND SPATIO-TEMPORAL EPIDEMIOLOGY, 2020, 35
  • [44] Android malicious behavior recognition and classification method based on random forest algorithm
    Ke D.-X.
    Pan L.-M.
    Luo S.-L.
    Zhang H.-Q.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2019, 53 (10): : 2013 - 2023
  • [45] What Is Large in Large-Scale? A Taxonomy of Scale for Agile Software Development
    Dingsoyr, Torgeir
    Faegri, Tor Erlend
    Itkonen, Juha
    PRODUCT-FOCUSED SOFTWARE PROCESS IMPROVEMENT, PROFES 2014, 2014, 8892 : 273 - 276
  • [46] Random forest ensemble classification based fuzzy logic
    Ben Ayed, Abdelkarim
    Benhammouda, Marwa
    Ben Halima, Mohamed
    Alimi, Adel M.
    NINTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2016), 2017, 10341
  • [47] UNIX FEATURES FOR LARGE-SCALE MAINFRAMES
    SEGAL, BM
    ROBERTSON, LM
    PROCEEDINGS : SEAS ANNIVERSARY MEETING 1989, VOLS 1 AND 2: THE CORPORATE NETWORK, 1989, : 859 - 863
  • [48] Large-Scale Web Page Classification
    Marath, Sathi T.
    Shepherd, Michael
    Milios, Evangelos
    Duffy, Jack
    2014 47TH HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS), 2014, : 1813 - 1822
  • [49] Large-scale Packet Classification on FPGA
    Zhou, Shijie
    Qu, Yun R.
    Prasanna, Viktor K.
    PROCEEDINGS OF THE ASAP2015 2015 IEEE 26TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, 2015, : 226 - 233
  • [50] Large-Scale Robust Semisupervised Classification
    Zhang, Lingling
    Luo, Minnan
    Li, Zhihui
    Nie, Feiping
    Zhang, Huaxiang
    Liu, Jun
    Zheng, Qinghua
    IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (03) : 907 - 917