CodeGraphSMOTE - Data Augmentation for Vulnerability Discovery

被引：1

作者：

Ganz, Tom ^{[1
]}

Imgrund, Erik ^{[1
]}

Haerterich, Martin ^{[1
]}

Rieck, Konrad ^{[2
]}

机构：

[1] SAP Secur Res, Walldorf, Germany

[2] Tech Univ Berlin, Berlin, Germany

来源：

DATA AND APPLICATIONS SECURITY AND PRIVACY XXXVII, DBSEC 2023 | 2023年 / 13942卷

关键词：

Vulnerability Discovery; Data Augmentation; Graph Neural Networks;

D O I：

10.1007/978-3-031-37586-6_17

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The automated discovery of vulnerabilities at scale is a crucial area of research in software security. While numerous machine learning models for detecting vulnerabilities are known, recent studies show that their generalizability and transferability heavily depend on the quality of the training data. Due to the scarcity of real vulnerabilities, available datasets are highly imbalanced, making it difficult for deep learning models to learn and generalize effectively. Based on the fact that programs can inherently be represented by graphs and to leverage recent advances in graph neural networks, we propose a novel method to generate synthetic code graphs for data augmentation to enhance vulnerability discovery. Our method includes two significant contributions: a novel approach for generating synthetic code graphs and a graph-to-code transformer to convert code graphs into their code representation. Applying our augmentation strategy to vulnerability discovery models achieves the same originally reported F1-score with less than 20% of the original dataset and we outperform the F1-score of prior work on augmentation strategies by up to 25.6% in detection performance.

引用

页码：282 / 301

页数：20

共 50 条

[21] Practitioner Perception of Vulnerability Discovery Strategies
Bhuiyan, Farzana Ahamed
Murphy, Justin
Morrison, Patrick
Rahman, Akond
2021 IEEE/ACM 2ND INTERNATIONAL WORKSHOP ON ENGINEERING AND CYBERSECURITY OF CRITICAL SYSTEMS (ENCYCRIS 2021), 2021, : 41 - 44
[22] IKE vulnerability discovery based on fuzzing
Yang, Huan
Zhang, Yuqing
Hu, Yu-pu
Liu, Qi-xu
SECURITY AND COMMUNICATION NETWORKS, 2013, 6 (07) : 889 - 901
[23] Prediction capabilities of vulnerability discovery models
Alhazmi, Omar H.
Malaiya, Yashwant K.
2006 PROCEEDINGS - ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM, VOLS 1 AND 2, 2006, : 86 - +
[24] Vulnerability discovery modelling: A general framework
Anand A.
Bhatt N.
Alhazmi O.H.
International Journal of Information and Computer Security, 2021, 16 (1-2) : 192 - 206
[25] Fuzzing: On the Exponential Cost of Vulnerability Discovery
Bohme, Marcel
Falk, Brandon
PROCEEDINGS OF THE 28TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '20), 2020, : 713 - 724
[26] Seasonal Variation in the Vulnerability Discovery Process
Joh, HyunChul
Malaiya, Yashwant K.
SECOND INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION, AND VALIDATION, PROCEEDINGS, 2009, : 191 - 200
[27] Evaluation of Branch Prediction Vulnerability and New Vulnerability Discovery on ARM Processors
Wang C.
Tian R.
Zhao X.
Lü Y.
Wang D.
Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2021, 55 (07): : 71 - 78
[28] The art of data augmentation
van Dyk, DA
Meng, XL
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2001, 10 (01) : 1 - 50
[29] Data Augmentation for Manipulation
Mitrano, Peter
Berenson, Dmitry
ROBOTICS: SCIENCE AND SYSTEM XVIII, 2022,
[30] Data Augmentation for Electrocardiograms
Raghu, Aniruddh
Shanmugam, Divya
Pomerantsev, Eugene
Guttag, John
Stultz, Collin M.
CONFERENCE ON HEALTH, INFERENCE, AND LEARNING, VOL 174, 2022, 174 : 282 - 310

← 1 2 3 4 5 →