Robust malware clustering of windows portable executables using ensemble latent representation and distribution modeling

被引:0
|
作者
Rizvi, Syed Khurram Jah [1 ]
Fraz, Muhammad Moazam [1 ,2 ]
机构
[1] Natl Univ Sci & Technol, Sch Elect Engn & Comp Sci, Islamabad, Pakistan
[2] Natl Univ Sci & Technol, Sch Elect Engn & Comp Sci, Sect H-12, Islamabad 44000, Pakistan
来源
关键词
autoencoder; clustering of portable executable; distribution modeling; ensemble neural network; static analysis; FEATURE-EXTRACTION; NETWORK;
D O I
10.1002/cpe.7621
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Malware is a malicious program used for unauthorized access to organizational infrastructure and systems. To overcome challenges of exponential growth of malware, notable research has been made for unsupervised clustering of Windows-based portable executable (PE). Nevertheless, to the best of our knowledge there has been no research for robust cluster prediction of Windows based PEs using static features. To this end, we proposed an ensemble neural network architecture for unsupervised feature learning and its distribution modeling for robust clustering of PE(s). The novel architecture is a cascaded formation of a deep autoencoder (AE) network and latent distribution modeling (LDM) network. The AE performs feature learning using latent representation and LDM performs the distribution modeling of latent representation using Gaussian approximation. An objective function is also devised for model optimization. The network adjusts the Gaussian components to optimize the distribution modeling. It also performs adjustments for data representations toward related Gaussian centers to make the model behave in adaptive manner. A novel malware dataset has also been collected by employing endpoint security management solution over enterprise network to assess proposed architecture. The dataset contains 21,486 samples including 14,497 malicious and 6989 benign ones. We also performed the evaluation of proposed architecture over publicly available benchmark malware dataset including 138,047 samples comprising 96,742 malicious and 41,323 benign PEs. The experimental results demonstrated that the proposed architecture yielded more than 95% accuracy for cluster prediction. The novel architecture has achieved superior performance and outperformed progressive techniques. The dataset along with implementation are accessible at .
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Malware Classification of Portable Executables using Tree-Based Ensemble Machine Learning
    Atluri, Venkata
    2019 IEEE SOUTHEASTCON, 2019,
  • [2] Cross-validation of machine learning algorithms for malware detection using static features of Windows portable executables: A Comparative Study
    Aslam, Warda
    Fraz, M. M.
    Rizvi, S. K.
    Saleem, S.
    2020 IEEE 17TH INTERNATIONAL CONFERENCE ON SMART COMMUNITIES: IMPROVING QUALITY OF LIFE USING ICT, IOT AND AI (IEEEHONET 2020), 2020, : 73 - 77
  • [3] Windows PE Malware Detection Using Ensemble Learning
    Azeez, Nureni Ayofe
    Odufuwa, Oluwanifise Ebunoluwa
    Misra, Sanjay
    Oluranti, Jonathan
    Damasevicius, Robertas
    INFORMATICS-BASEL, 2021, 8 (01):
  • [4] Robust Subspace Clustering via Latent Smooth Representation Clustering
    Xiao, Xiaobo
    Wei, Lai
    NEURAL PROCESSING LETTERS, 2020, 52 (02) : 1317 - 1337
  • [5] Robust Subspace Clustering via Latent Smooth Representation Clustering
    Xiaobo Xiao
    Lai Wei
    Neural Processing Letters, 2020, 52 : 1317 - 1337
  • [6] Malware Detection for Portable Executables Using a Multi-input Transformer-based Approach
    Huoh, Ting-Li
    Miskell, Timothy
    Barut, Onur
    Luo, Yan
    Li, Peilong
    Zhang, Tong
    2024 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS, ICNC, 2024, : 778 - 782
  • [7] Robust latent low rank representation for subspace clustering
    Zhang, Hongyang
    Lin, Zhouchen
    Zhang, Chao
    Gao, Junbin
    NEUROCOMPUTING, 2014, 145 : 369 - 373
  • [8] Network ensemble clustering using latent roles
    Brandes, Ulrik
    Lerner, Juergen
    Nagel, Uwe
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2011, 5 (02) : 81 - 94
  • [9] Network ensemble clustering using latent roles
    Ulrik Brandes
    Jürgen Lerner
    Uwe Nagel
    Advances in Data Analysis and Classification, 2011, 5 : 81 - 94
  • [10] Malware Detection on Windows Portable Executables with Long-Short Term Memory Trained on PCA Selected and TF-IDF Engineered Windows API Call Sequences
    De Goma, Joel C.
    Dela Vega, John Eemmauel J.
    Geneta, Daniel M.
    Gil, Claire Francheska M.
    PROCEEDINGS OF THE 2024 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION TECHNOLOGY, ICIIT 2024, 2024, : 423 - 429