Word Embeddings for Fake Malware Generation

被引:0
|
作者
Tran, Quang Duy [1 ]
Di Troia, Fabio [1 ]
机构
[1] San Jose State Univ, San Jose, CA 95192 USA
来源
SILICON VALLEY CYBERSECURITY CONFERENCE, SVCC 2022 | 2022年 / 1683卷
关键词
BERT; GAN; Malware; Malware detection; Word embedding;
D O I
10.1007/978-3-031-24049-2_2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Signature and anomaly-based techniques are the fundamental methods to detect malware. However, in recent years this type of threat has advanced to become more complex and sophisticated, making these techniques less effective. For this reason, researchers have resorted to state-of-the-art machine learning techniques to combat the threat of information security. Nevertheless, despite the integration of the machine learning models, there is still a shortage of data in training that prevents these models from performing at their peak. In the past, generative models have been found to be highly effective at generating image-like data that are similar to the actual data distribution. In this paper, we leverage the knowledge of generative modeling on opcode sequences and aim to generate malware samples by taking advantage of the contextualized embeddings from BERT. We obtained promising results when differentiating between real and generated samples. We observe that generated malware has such similar characteristics to actual malware that the classifiers are having difficulty in distinguishing between the two, in which the classifiers falsely identify the generated malware as actual malware almost 90% of the time.
引用
收藏
页码:22 / 37
页数:16
相关论文
共 50 条
  • [1] Using Word Embeddings to Deter Intellectual Property Theft through Automated Generation of Fake Documents
    Abdibayev, Almas
    Chen, Dongkai
    Chen, Haipeng
    Poluru, Deepti
    Subrahmanian, V. S.
    ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS, 2021, 12 (02)
  • [2] Fake Malware Generation Using HMM and GAN
    Trehan, Harshit
    Di Troia, Fabio
    SILICON VALLEY CYBERSECURITY CONFERENCE, SVCC 2021, 2022, 1536 : 3 - 21
  • [3] Automatic Malware Clustering using Word Embeddings and Unsupervised Learning
    Leonardo Duarte-Garcia, Hugo
    Cortez-Marquez, Alberto
    Sanchez-Perez, Gabriel
    Perez-Meana, Hector
    Toscano-Medina, Karina
    Hernandez-Suarez, Aldo
    2019 7TH INTERNATIONAL WORKSHOP ON BIOMETRICS AND FORENSICS (IWBF), 2019,
  • [4] Exploring fake news identification using word and sentence embeddings
    Priyanga, V. T.
    Sanjanasri, J. P.
    Menon, Vijay Krishna
    Gopalakrishnan, E. A.
    Soman, K. P.
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (05) : 5441 - 5448
  • [5] Semantic Word Cloud Generation Based on Word Embeddings
    Xu, Jin
    Tao, Yubo
    Lin, Hai
    2016 IEEE PACIFIC VISUALIZATION SYMPOSIUM (PACIFICVIS), 2016, : 239 - 243
  • [6] Text Data Augmentation Techniques for Word Embeddings in Fake News Classification
    Kapusta, Jozef
    Drzik, David
    Steflovic, Kirsten
    Nagy, Kitti Szabo
    IEEE ACCESS, 2024, 12 : 31538 - 31550
  • [7] Automated Template Generation based on Word Embeddings
    Manatuica, Maria
    Dascalu, Mihai
    Ruseti, Stefan
    Trausan-Matu, Stefan
    PROCEEDINGS OF THE 14TH INTERNATIONAL SCIENTIFIC CONFERENCE ELEARNING AND SOFTWARE FOR EDUCATION: ELEARNING CHALLENGES AND NEW HORIZONS, VOL 2, 2018, : 392 - 398
  • [8] An Semi-supervised Learning Methodology for Malware Categorization using Weighted Word Embeddings
    Leonardo Duarte-Garcia, Hugo
    Domenick Morales-Medina, Carlos
    Hernandez-Suarez, Aldo
    Sanchez-Perez, Gabriel
    Toscano-Medina, Karina
    Perez-Meana, Hector
    Sanchez, Victor
    Sandoval Orozco, Ana Lucila
    2019 4TH IEEE EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (EUROS&PW), 2019, : 238 - 246
  • [9] Deep Fake Recognition in Tweets Using Text Augmentation, Word Embeddings and Deep Learning
    Tesfagergish, Senait G.
    Damasevicius, Robertas
    Kapociute-Dzikiene, Jurgita
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2021, PT VI, 2021, 12954 : 523 - 538
  • [10] Malware classification using word embeddings algorithms and long-short term memory networks
    Andrade, Eduardo de O.
    Viterbo, Jose
    Guerin, Joris
    Bernardini, Flavia
    COMPUTATIONAL INTELLIGENCE, 2022, 38 (05) : 1802 - 1830