CodeGraphSMOTE - Data Augmentation for Vulnerability Discovery

被引:1
|
作者
Ganz, Tom [1 ]
Imgrund, Erik [1 ]
Haerterich, Martin [1 ]
Rieck, Konrad [2 ]
机构
[1] SAP Secur Res, Walldorf, Germany
[2] Tech Univ Berlin, Berlin, Germany
关键词
Vulnerability Discovery; Data Augmentation; Graph Neural Networks;
D O I
10.1007/978-3-031-37586-6_17
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The automated discovery of vulnerabilities at scale is a crucial area of research in software security. While numerous machine learning models for detecting vulnerabilities are known, recent studies show that their generalizability and transferability heavily depend on the quality of the training data. Due to the scarcity of real vulnerabilities, available datasets are highly imbalanced, making it difficult for deep learning models to learn and generalize effectively. Based on the fact that programs can inherently be represented by graphs and to leverage recent advances in graph neural networks, we propose a novel method to generate synthetic code graphs for data augmentation to enhance vulnerability discovery. Our method includes two significant contributions: a novel approach for generating synthetic code graphs and a graph-to-code transformer to convert code graphs into their code representation. Applying our augmentation strategy to vulnerability discovery models achieves the same originally reported F1-score with less than 20% of the original dataset and we outperform the F1-score of prior work on augmentation strategies by up to 25.6% in detection performance.
引用
收藏
页码:282 / 301
页数:20
相关论文
共 50 条
  • [31] Data Augmentation for Diffusions
    Papaspiliopoulos, Omiros
    Roberts, Gareth O.
    Stramer, Osnat
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2013, 22 (03) : 665 - 688
  • [32] Augmentation of adaptation data
    Vipperla, Ravichander
    Renals, Steve
    Frankel, Joe
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 530 - 533
  • [33] Rotational Data Augmentation for Electroencephalographic Data
    Krell, Mario Michael
    Kim, Su Kyoung
    2017 39TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2017, : 471 - 474
  • [34] Pattern-based methods for vulnerability discovery
    Yamaguchi F.
    IT - Information Technology, 2017, 59 (02): : 101 - 106
  • [35] Automated Vulnerability Discovery and Exploitation in the Internet of Things
    Wang, Zhongru
    Zhang, Yuntao
    Tian, Zhihong
    Ruan, Qiang
    Liu, Tong
    Wang, Haichen
    Liu, Zhehui
    Lin, Jiayi
    Fang, Binxing
    Shi, Wei
    SENSORS, 2019, 19 (15)
  • [36] Software Vulnerability Analysis and Discovery Using Machine-Learning and Data-Mining Techniques: A Survey
    Ghaffarian, Seyed Mohammad
    Shahriari, Hamid Reza
    ACM COMPUTING SURVEYS, 2017, 50 (04)
  • [37] Periodicity in software vulnerability discovery, patching and exploitation
    HyunChul Joh
    Yashwant K. Malaiya
    International Journal of Information Security, 2017, 16 : 673 - 690
  • [38] Effort and Coverage Dependent Vulnerability Discovery Modeling
    Kansal, Yogita
    Kapur, P. K.
    Kumar, Uday
    Kumar, Deepak
    2017 2ND INTERNATIONAL CONFERENCE ON TELECOMMUNICATION AND NETWORKS (TEL-NET), 2017, : 329 - 334
  • [39] Change Point Modelling in the Vulnerability Discovery Process
    Sharma, Ruchi
    Sibal, Ritu
    Sabharwal, Sangeeta
    ADVANCED INFORMATICS FOR COMPUTING RESEARCH, PT II, 2019, 956 : 559 - 568
  • [40] An analysis of the vulnerability discovery process in web browsers
    Woo, Sung-Whan
    Alhazmi, Omar H.
    Malaiya, Yashwant K.
    PROCEEDINGS OF THE 10TH IASTED INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND APPLICATIONS, 2006, : 172 - +