Word Embeddings for Fake Malware Generation

被引:0
|
作者
Tran, Quang Duy [1 ]
Di Troia, Fabio [1 ]
机构
[1] San Jose State Univ, San Jose, CA 95192 USA
来源
SILICON VALLEY CYBERSECURITY CONFERENCE, SVCC 2022 | 2022年 / 1683卷
关键词
BERT; GAN; Malware; Malware detection; Word embedding;
D O I
10.1007/978-3-031-24049-2_2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Signature and anomaly-based techniques are the fundamental methods to detect malware. However, in recent years this type of threat has advanced to become more complex and sophisticated, making these techniques less effective. For this reason, researchers have resorted to state-of-the-art machine learning techniques to combat the threat of information security. Nevertheless, despite the integration of the machine learning models, there is still a shortage of data in training that prevents these models from performing at their peak. In the past, generative models have been found to be highly effective at generating image-like data that are similar to the actual data distribution. In this paper, we leverage the knowledge of generative modeling on opcode sequences and aim to generate malware samples by taking advantage of the contextualized embeddings from BERT. We obtained promising results when differentiating between real and generated samples. We observe that generated malware has such similar characteristics to actual malware that the classifiers are having difficulty in distinguishing between the two, in which the classifiers falsely identify the generated malware as actual malware almost 90% of the time.
引用
收藏
页码:22 / 37
页数:16
相关论文
共 50 条
  • [21] Dynamic Word Embeddings
    Bamler, Robert
    Mandt, Stephan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [22] Urdu Word Embeddings
    Haider, Samar
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 964 - 968
  • [23] isiZulu Word Embeddings
    Dlamini, Sibonelo
    Jembere, Edgar
    Pillay, Anban
    van Niekerk, Brett
    2021 CONFERENCE ON INFORMATION COMMUNICATIONS TECHNOLOGY AND SOCIETY (ICTAS), 2021, : 121 - 126
  • [24] Topical Word Embeddings
    Liu, Yang
    Liu, Zhiyuan
    Chua, Tat-Seng
    Sun, Maosong
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2418 - 2424
  • [25] Bias in Word Embeddings
    Papakyriakopoulos, Orestis
    Hegelich, Simon
    Serrano, Juan Carlos Medina
    Marco, Fabienne
    FAT* '20: PROCEEDINGS OF THE 2020 CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, 2020, : 446 - 457
  • [26] Compressing Word Embeddings
    Andrews, Martin
    NEURAL INFORMATION PROCESSING, ICONIP 2016, PT IV, 2016, 9950 : 413 - 422
  • [27] Relational Word Embeddings
    Camacho-Collados, Jose
    Espinosa-Anke, Luis
    Schockaert, Steven
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3286 - 3296
  • [28] Malware detection using machine learning based on word2vec embeddings of machine code instructions
    Popov, Igor
    2017 SIBERIAN SYMPOSIUM ON DATA SCIENCE AND ENGINEERING (SSDSE), 2017, : 1 - 4
  • [29] Arabic Fake News Detection in Social Media Context Using Word Embeddings and Pre-trained Transformers
    Azzeh, Mohammad
    Qusef, Abdallah
    Alabboushi, Omar
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2025, 50 (02) : 923 - 936
  • [30] Malware Detection through Contextualized Vector Embeddings
    Pandya, Vinay
    Di Troia, Fabio
    2023 SILICON VALLEY CYBERSECURITY CONFERENCE, SVCC, 2023,