Large-Scale Malicious Software Classification With Fuzzified Features and Boosted Fuzzy Random Forest

被引：7

作者：

Li, Fang-Qi ^{[1
]}

Wang, Shi-Lin ^{[1
]}

Liew, Alan Wee-Chung ^{[2
]}

Ding, Weiping ^{[3
]}

Liu, Gong-Shen ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai 200240, Peoples R China

[2] Griffith Univ, Sch Informat & Commun Technol, Gold Coast, Qld 4222, Australia

[3] Nantong Univ, Sch Informat Sci & Technol, Nantong 226019, Peoples R China

来源：

IEEE TRANSACTIONS ON FUZZY SYSTEMS | 2021年 / 29卷 / 11期

基金：

中国国家自然科学基金;

关键词：

Malware; Feature extraction; Machine learning; Decision trees; Forestry; Support vector machines; Boosted random forest; computer security; fuzzy decision tree; malware classification; MACHINE; SYSTEM;

D O I：

10.1109/TFUZZ.2020.3016023

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Classification of malicious software, especially in a very large dataset, is a challenging task for machine intelligence. Malware can have highly diversified features, each of which has highly heterogeneous distributions. These factors increase the difficulties for traditional data analytic approaches to deal with them. Although deep learning based methods have reported good classification performance, the deep models usually lack interpretability and are fragile under adversarial attacks. To solve these problems, fuzzy systems have become a competitive candidate in malware analysis. In this article, a new fuzzy-based approach is proposed for malware classification. We focused on portable executable files in the Windows platform and analyzed the distributions of static features and content-oriented features. Fuzzification was used to reduce the ubiquitous impact of noise and outliers in a very large dataset. Finally, a novel boosted classifier consisted of fuzzy decision trees and support vector machine is proposed to perform the malware classification. By using fuzzy decision trees, the inner structure of the classifier can be readily interpreted as discriminative rules, whereas the novel boosting strategy provides state-of-the-art classification performance. Extensive experimental results showed that our method significantly outperformed several state-of-the-art classifiers.

引用

页码：3205 / 3218

页数：14

共 50 条

[31] Analyzing the evolution of large-scale software
Mens, T
Ramil, JF
Godfrey, MW
JOURNAL OF SOFTWARE MAINTENANCE AND EVOLUTION-RESEARCH AND PRACTICE, 2004, 16 (06): : 363 - 365
[32] FOREST DEFINITIONS IN LARGE-SCALE INVENTORIES
KLEINN, C
ALLGEMEINE FORST UND JAGDZEITUNG, 1991, 162 (11-12): : 201 - 210
[33] The urbanized forest and large-scale disturbances
Omi, PN
MEETING IN THE MIDDLE, PROCEEDINGS, 1997, : 86 - 92
[34] Large-scale forest bioenergy not sustainable
Faden, Mike
FRONTIERS IN ECOLOGY AND THE ENVIRONMENT, 2012, 10 (05) : 229 - 229
[35] Filtration model for the detection of malicious traffic in large-scale networks
Ahmed, Abdulghani Ali
Jantan, Aman
Wan, Tat-Chee
COMPUTER COMMUNICATIONS, 2016, 82 : 59 - 70
[36] Large-scale structures in random graphs
Bottcher, Julia
SURVEYS IN COMBINATORICS 2017, 2017, 440 : 87 - 140
[37] MORTON: Detection of Malicious Routines in Large-Scale DNS Traffic
Daihes, Yael
Tzaban, Hen
Nadler, Asaf
Shabtai, Asaf
COMPUTER SECURITY - ESORICS 2021, PT I, 2021, 12972 : 736 - 756
[38] Robustness in large-scale random networks
Kim, N
Médard, M
IEEE INFOCOM 2004: THE CONFERENCE ON COMPUTER COMMUNICATIONS, VOLS 1-4, PROCEEDINGS, 2004, : 2364 - 2373
[39] Attention graph: Learning effective visual features for large-scale image classification
Cui, Xuelian
Zhang, Zhanjie
Zhang, Tao
Yang, Zhuoqun
Yang, Jie
JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2022, 16
[40] Optimisation of classification algorithm of associated data features of large-scale network system
Cao, Yu
INTERNATIONAL JOURNAL OF INTERNET PROTOCOL TECHNOLOGY, 2020, 13 (02) : 55 - 60

← 1 2 3 4 5 →