CodeGraphSMOTE - Data Augmentation for Vulnerability Discovery

被引:1
|
作者
Ganz, Tom [1 ]
Imgrund, Erik [1 ]
Haerterich, Martin [1 ]
Rieck, Konrad [2 ]
机构
[1] SAP Secur Res, Walldorf, Germany
[2] Tech Univ Berlin, Berlin, Germany
关键词
Vulnerability Discovery; Data Augmentation; Graph Neural Networks;
D O I
10.1007/978-3-031-37586-6_17
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The automated discovery of vulnerabilities at scale is a crucial area of research in software security. While numerous machine learning models for detecting vulnerabilities are known, recent studies show that their generalizability and transferability heavily depend on the quality of the training data. Due to the scarcity of real vulnerabilities, available datasets are highly imbalanced, making it difficult for deep learning models to learn and generalize effectively. Based on the fact that programs can inherently be represented by graphs and to leverage recent advances in graph neural networks, we propose a novel method to generate synthetic code graphs for data augmentation to enhance vulnerability discovery. Our method includes two significant contributions: a novel approach for generating synthetic code graphs and a graph-to-code transformer to convert code graphs into their code representation. Applying our augmentation strategy to vulnerability discovery models achieves the same originally reported F1-score with less than 20% of the original dataset and we outperform the F1-score of prior work on augmentation strategies by up to 25.6% in detection performance.
引用
收藏
页码:282 / 301
页数:20
相关论文
共 50 条
  • [21] Practitioner Perception of Vulnerability Discovery Strategies
    Bhuiyan, Farzana Ahamed
    Murphy, Justin
    Morrison, Patrick
    Rahman, Akond
    2021 IEEE/ACM 2ND INTERNATIONAL WORKSHOP ON ENGINEERING AND CYBERSECURITY OF CRITICAL SYSTEMS (ENCYCRIS 2021), 2021, : 41 - 44
  • [22] IKE vulnerability discovery based on fuzzing
    Yang, Huan
    Zhang, Yuqing
    Hu, Yu-pu
    Liu, Qi-xu
    SECURITY AND COMMUNICATION NETWORKS, 2013, 6 (07) : 889 - 901
  • [23] Prediction capabilities of vulnerability discovery models
    Alhazmi, Omar H.
    Malaiya, Yashwant K.
    2006 PROCEEDINGS - ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM, VOLS 1 AND 2, 2006, : 86 - +
  • [24] Vulnerability discovery modelling: A general framework
    Anand A.
    Bhatt N.
    Alhazmi O.H.
    International Journal of Information and Computer Security, 2021, 16 (1-2) : 192 - 206
  • [25] Fuzzing: On the Exponential Cost of Vulnerability Discovery
    Bohme, Marcel
    Falk, Brandon
    PROCEEDINGS OF THE 28TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '20), 2020, : 713 - 724
  • [26] Seasonal Variation in the Vulnerability Discovery Process
    Joh, HyunChul
    Malaiya, Yashwant K.
    SECOND INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION, AND VALIDATION, PROCEEDINGS, 2009, : 191 - 200
  • [27] Evaluation of Branch Prediction Vulnerability and New Vulnerability Discovery on ARM Processors
    Wang C.
    Tian R.
    Zhao X.
    Lü Y.
    Wang D.
    Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2021, 55 (07): : 71 - 78
  • [28] The art of data augmentation
    van Dyk, DA
    Meng, XL
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2001, 10 (01) : 1 - 50
  • [29] Data Augmentation for Manipulation
    Mitrano, Peter
    Berenson, Dmitry
    ROBOTICS: SCIENCE AND SYSTEM XVIII, 2022,
  • [30] Data Augmentation for Electrocardiograms
    Raghu, Aniruddh
    Shanmugam, Divya
    Pomerantsev, Eugene
    Guttag, John
    Stultz, Collin M.
    CONFERENCE ON HEALTH, INFERENCE, AND LEARNING, VOL 174, 2022, 174 : 282 - 310