Word Embeddings for Fake Malware Generation

被引：0

作者：

Tran, Quang Duy ^{[1
]}

Di Troia, Fabio ^{[1
]}

机构：

[1] San Jose State Univ, San Jose, CA 95192 USA

来源：

SILICON VALLEY CYBERSECURITY CONFERENCE, SVCC 2022 | 2022年 / 1683卷

关键词：

BERT; GAN; Malware; Malware detection; Word embedding;

D O I：

10.1007/978-3-031-24049-2_2

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Signature and anomaly-based techniques are the fundamental methods to detect malware. However, in recent years this type of threat has advanced to become more complex and sophisticated, making these techniques less effective. For this reason, researchers have resorted to state-of-the-art machine learning techniques to combat the threat of information security. Nevertheless, despite the integration of the machine learning models, there is still a shortage of data in training that prevents these models from performing at their peak. In the past, generative models have been found to be highly effective at generating image-like data that are similar to the actual data distribution. In this paper, we leverage the knowledge of generative modeling on opcode sequences and aim to generate malware samples by taking advantage of the contextualized embeddings from BERT. We obtained promising results when differentiating between real and generated samples. We observe that generated malware has such similar characteristics to actual malware that the classifiers are having difficulty in distinguishing between the two, in which the classifiers falsely identify the generated malware as actual malware almost 90% of the time.

引用

页码：22 / 37

页数：16

共 50 条

[21] Dynamic Word Embeddings
Bamler, Robert
Mandt, Stephan
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[22] Urdu Word Embeddings
Haider, Samar
PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 964 - 968
[23] isiZulu Word Embeddings
Dlamini, Sibonelo
Jembere, Edgar
Pillay, Anban
van Niekerk, Brett
2021 CONFERENCE ON INFORMATION COMMUNICATIONS TECHNOLOGY AND SOCIETY (ICTAS), 2021, : 121 - 126
[24] Topical Word Embeddings
Liu, Yang
Liu, Zhiyuan
Chua, Tat-Seng
Sun, Maosong
PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2418 - 2424
[25] Bias in Word Embeddings
Papakyriakopoulos, Orestis
Hegelich, Simon
Serrano, Juan Carlos Medina
Marco, Fabienne
FAT* '20: PROCEEDINGS OF THE 2020 CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, 2020, : 446 - 457
[26] Compressing Word Embeddings
Andrews, Martin
NEURAL INFORMATION PROCESSING, ICONIP 2016, PT IV, 2016, 9950 : 413 - 422
[27] Relational Word Embeddings
Camacho-Collados, Jose
Espinosa-Anke, Luis
Schockaert, Steven
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3286 - 3296
[28] Malware detection using machine learning based on word2vec embeddings of machine code instructions
Popov, Igor
2017 SIBERIAN SYMPOSIUM ON DATA SCIENCE AND ENGINEERING (SSDSE), 2017, : 1 - 4
[29] Arabic Fake News Detection in Social Media Context Using Word Embeddings and Pre-trained Transformers
Azzeh, Mohammad
Qusef, Abdallah
Alabboushi, Omar
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2025, 50 (02) : 923 - 936
[30] Malware Detection through Contextualized Vector Embeddings
Pandya, Vinay
Di Troia, Fabio
2023 SILICON VALLEY CYBERSECURITY CONFERENCE, SVCC, 2023,

← 1 2 3 4 5 →