Robust malware clustering of windows portable executables using ensemble latent representation and distribution modeling

被引:0
|
作者
Rizvi, Syed Khurram Jah [1 ]
Fraz, Muhammad Moazam [1 ,2 ]
机构
[1] Natl Univ Sci & Technol, Sch Elect Engn & Comp Sci, Islamabad, Pakistan
[2] Natl Univ Sci & Technol, Sch Elect Engn & Comp Sci, Sect H-12, Islamabad 44000, Pakistan
来源
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2023年 / 35卷 / 08期
关键词
autoencoder; clustering of portable executable; distribution modeling; ensemble neural network; static analysis; FEATURE-EXTRACTION; NETWORK;
D O I
10.1002/cpe.7621
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Malware is a malicious program used for unauthorized access to organizational infrastructure and systems. To overcome challenges of exponential growth of malware, notable research has been made for unsupervised clustering of Windows-based portable executable (PE). Nevertheless, to the best of our knowledge there has been no research for robust cluster prediction of Windows based PEs using static features. To this end, we proposed an ensemble neural network architecture for unsupervised feature learning and its distribution modeling for robust clustering of PE(s). The novel architecture is a cascaded formation of a deep autoencoder (AE) network and latent distribution modeling (LDM) network. The AE performs feature learning using latent representation and LDM performs the distribution modeling of latent representation using Gaussian approximation. An objective function is also devised for model optimization. The network adjusts the Gaussian components to optimize the distribution modeling. It also performs adjustments for data representations toward related Gaussian centers to make the model behave in adaptive manner. A novel malware dataset has also been collected by employing endpoint security management solution over enterprise network to assess proposed architecture. The dataset contains 21,486 samples including 14,497 malicious and 6989 benign ones. We also performed the evaluation of proposed architecture over publicly available benchmark malware dataset including 138,047 samples comprising 96,742 malicious and 41,323 benign PEs. The experimental results demonstrated that the proposed architecture yielded more than 95% accuracy for cluster prediction. The novel architecture has achieved superior performance and outperformed progressive techniques. The dataset along with implementation are accessible at .
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Robust Spectral Clustering for Noisy Data Modeling Sparse Corruptions Improves Latent Embeddings
    Bojchevski, Aleksandar
    Matkovic, Yves
    Guennemann, Stephan
    KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 737 - 746
  • [22] MFEMDroid: A Novel Malware Detection Framework Using Combined Multitype Features and Ensemble Modeling
    Gu, Wei
    Xing, Hongyan
    Hou, Tianhao
    IET INFORMATION SECURITY, 2024, 2024
  • [23] Ensemble-Based Classification Using Neural Networks and Machine Learning Models for Windows PE Malware Detection
    Damasevicius, Robertas
    Venckauskas, Algimantas
    Toldinas, Jevgenijus
    Grigaliunas, Sarunas
    ELECTRONICS, 2021, 10 (04) : 1 - 26
  • [24] Robust subspace clustering via symmetry constrained latent low rank representation with converted nuclear norm
    Fang, Xian
    Tie, Zhixin
    Song, Feiyang
    Yang, Jialiang
    NEUROCOMPUTING, 2019, 340 : 211 - 221
  • [25] Robust mixture clustering using Pearson type VII distribution
    Sun, Jianyong
    Kaban, Ata
    Garibaldi, Jonathan M.
    PATTERN RECOGNITION LETTERS, 2010, 31 (16) : 2447 - 2454
  • [26] Robust clustering and outlier rejection using the Mahalanobis distance distribution
    Roizman, Violeta
    Jonckheere, Matthieu
    Pascal, Frederic
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 2448 - 2452
  • [27] Robust subspace clustering based on latent low rank representation with non-negative sparse Laplacian constraints
    Xu, Zhixuan
    Chen, Caikou
    Han, Guojiang
    Gao, Jun
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (06) : 12151 - 12165
  • [28] Modeling Hereditary Disease Behavior Using an Innovative Similarity Criterion and Ensemble Clustering
    Mojarad, Musa
    Sarhangnia, Fariba
    Rezaeipanah, Amin
    Parvin, Hamin
    Nejatian, Samad
    CURRENT BIOINFORMATICS, 2021, 16 (05) : 749 - 764
  • [29] Robust mixture of experts modeling using the t distribution
    Chamroukhi, F.
    NEURAL NETWORKS, 2016, 79 : 20 - 36
  • [30] Robust mixture modeling using the skew t distribution
    Tsung I. Lin
    Jack C. Lee
    Wan J. Hsieh
    Statistics and Computing, 2007, 17 : 81 - 92