Word Embeddings for Fake Malware Generation

被引：0

作者：

Tran, Quang Duy ^{[1
]}

Di Troia, Fabio ^{[1
]}

机构：

[1] San Jose State Univ, San Jose, CA 95192 USA

来源：

SILICON VALLEY CYBERSECURITY CONFERENCE, SVCC 2022 | 2022年 / 1683卷

关键词：

BERT; GAN; Malware; Malware detection; Word embedding;

D O I：

10.1007/978-3-031-24049-2_2

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Signature and anomaly-based techniques are the fundamental methods to detect malware. However, in recent years this type of threat has advanced to become more complex and sophisticated, making these techniques less effective. For this reason, researchers have resorted to state-of-the-art machine learning techniques to combat the threat of information security. Nevertheless, despite the integration of the machine learning models, there is still a shortage of data in training that prevents these models from performing at their peak. In the past, generative models have been found to be highly effective at generating image-like data that are similar to the actual data distribution. In this paper, we leverage the knowledge of generative modeling on opcode sequences and aim to generate malware samples by taking advantage of the contextualized embeddings from BERT. We obtained promising results when differentiating between real and generated samples. We observe that generated malware has such similar characteristics to actual malware that the classifiers are having difficulty in distinguishing between the two, in which the classifiers falsely identify the generated malware as actual malware almost 90% of the time.

引用

页码：22 / 37

页数：16

共 50 条

[1] Using Word Embeddings to Deter Intellectual Property Theft through Automated Generation of Fake Documents
Abdibayev, Almas
Chen, Dongkai
Chen, Haipeng
Poluru, Deepti
Subrahmanian, V. S.
ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS, 2021, 12 (02)
[2] Fake Malware Generation Using HMM and GAN
Trehan, Harshit
Di Troia, Fabio
SILICON VALLEY CYBERSECURITY CONFERENCE, SVCC 2021, 2022, 1536 : 3 - 21
[3] Automatic Malware Clustering using Word Embeddings and Unsupervised Learning
Leonardo Duarte-Garcia, Hugo
Cortez-Marquez, Alberto
Sanchez-Perez, Gabriel
Perez-Meana, Hector
Toscano-Medina, Karina
Hernandez-Suarez, Aldo
2019 7TH INTERNATIONAL WORKSHOP ON BIOMETRICS AND FORENSICS (IWBF), 2019,
[4] Exploring fake news identification using word and sentence embeddings
Priyanga, V. T.
Sanjanasri, J. P.
Menon, Vijay Krishna
Gopalakrishnan, E. A.
Soman, K. P.
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (05) : 5441 - 5448
[5] Semantic Word Cloud Generation Based on Word Embeddings
Xu, Jin
Tao, Yubo
Lin, Hai
2016 IEEE PACIFIC VISUALIZATION SYMPOSIUM (PACIFICVIS), 2016, : 239 - 243
[6] Text Data Augmentation Techniques for Word Embeddings in Fake News Classification
Kapusta, Jozef
Drzik, David
Steflovic, Kirsten
Nagy, Kitti Szabo
IEEE ACCESS, 2024, 12 : 31538 - 31550
[7] Automated Template Generation based on Word Embeddings
Manatuica, Maria
Dascalu, Mihai
Ruseti, Stefan
Trausan-Matu, Stefan
PROCEEDINGS OF THE 14TH INTERNATIONAL SCIENTIFIC CONFERENCE ELEARNING AND SOFTWARE FOR EDUCATION: ELEARNING CHALLENGES AND NEW HORIZONS, VOL 2, 2018, : 392 - 398
[8] An Semi-supervised Learning Methodology for Malware Categorization using Weighted Word Embeddings
Leonardo Duarte-Garcia, Hugo
Domenick Morales-Medina, Carlos
Hernandez-Suarez, Aldo
Sanchez-Perez, Gabriel
Toscano-Medina, Karina
Perez-Meana, Hector
Sanchez, Victor
Sandoval Orozco, Ana Lucila
2019 4TH IEEE EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (EUROS&PW), 2019, : 238 - 246
[9] Deep Fake Recognition in Tweets Using Text Augmentation, Word Embeddings and Deep Learning
Tesfagergish, Senait G.
Damasevicius, Robertas
Kapociute-Dzikiene, Jurgita
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2021, PT VI, 2021, 12954 : 523 - 538
[10] Malware classification using word embeddings algorithms and long-short term memory networks
Andrade, Eduardo de O.
Viterbo, Jose
Guerin, Joris
Bernardini, Flavia
COMPUTATIONAL INTELLIGENCE, 2022, 38 (05) : 1802 - 1830

← 1 2 3 4 5 →