Deep neural-based vulnerability discovery demystified: data, model and performance

被引：14

作者：

Lin, Guanjun ^{[1
]}

Xiao, Wei ^{[2
]}

Zhang, Leo Yu ^{[3
]}

Gao, Shang ^{[3
]}

Tai, Yonghang ^{[4
]}

Zhang, Jun ^{[5
]}

机构：

[1] Sanming Univ, Sch Informat Engn, Sanming, Fujian, Peoples R China

[2] Changchun Univ Technol, Sch Comp Sci & Engn, Changchun, Jilin, Peoples R China

[3] Deakin Univ, Sch Informat Technol, Geelong, Vic 3216, Australia

[4] Yunnan Normal Univ, Yunnan Key Lab Optoelect Informat Technol, Kunming, Yunnan, Peoples R China

[5] Swinburne Univ Technol, Sch Software & Elect Engn, Melbourne, Vic 3122, Australia

来源：

NEURAL COMPUTING & APPLICATIONS | 2021年 / 33卷 / 20期

关键词：

Vulnerability discovery; Deep learning; Function-level; Baseline dataset; Performance evaluation;

D O I：

10.1007/s00521-021-05954-3

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Detecting source-code level vulnerabilities at the development phase is a cost-effective solution to prevent potential attacks from happening at the software deployment stage. Many machine learning, including deep learning-based solutions, have been proposed to aid the process of vulnerability discovery. However, these approaches were mainly evaluated on self-constructed/-collected datasets. It is difficult to evaluate the effectiveness of proposed approaches due to lacking a unified baseline dataset. To bridge this gap, we construct a function-level vulnerability dataset from scratch, providing in source-code-label pairs. To evaluate the constructed dataset, a function-level vulnerability detection framework is built to incorporate six mainstream neural network models as vulnerability detectors. We perform experiments to investigate the performance behaviors of the neural model-based detectors using source code as raw input with continuous Bag-of-Words neural embeddings. Empirical results reveal that the variants of recurrent neural networks and convolutional neural network perform well on our dataset, as the former is capable of handling contextual information and the latter learns features from small context windows. In terms of generalization ability, the fully connected network outperforms the other network architectures. The performance evaluation can serve as a reference benchmark for neural model-based vulnerability detection at function-level granularity. Our dataset can serve as ground truth for ML-based function-level vulnerability detection and a baseline for evaluating relevant approaches.

引用

页码：13287 / 13300

页数：14

共 50 条

[1] Deep neural-based vulnerability discovery demystified: data, model and performance
Guanjun Lin
Wei Xiao
Leo Yu Zhang
Shang Gao
Yonghang Tai
Jun Zhang
Neural Computing and Applications, 2021, 33 : 13287 - 13300
[2] Model combination in neural-based forecasting
Freitas, Paulo S. A.
Rodrigues, Antnio J. L.
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2006, 173 (03) : 801 - 814
[3] Parallel Neural-based Hybrid Data Mining Ensemble
Hassan, Syed Zahid
Verma, Brijesh
ISSNIP 2008: PROCEEDINGS OF THE 2008 INTERNATIONAL CONFERENCE ON INTELLIGENT SENSORS, SENSOR NETWORKS, AND INFORMATION PROCESSING, 2008, : 115 - 119
[4] Exploiting deep convolutional neural networks for a neural-based learning classifier system
Kim, Ji-Yoon
Cho, Sung-Bae
NEUROCOMPUTING, 2019, 354 : 61 - 70
[5] Neural-Based Compression Scheme for Solar Image Data
Zafari, Ali
Khoshkhahtinat, Atefeh
Grajeda, Jeremy A.
Mehta, Piyush M.
Nasrabadi, Nasser M.
Boucheron, Laura E.
Thompson, Barbara J.
Kirk, Michael S. F.
da Silva, Daniel E.
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2024, 60 (01) : 918 - 933
[6] Deep Neural Embedding for Software Vulnerability Discovery: Comparison and Optimization
Yuan, Xue
Lin, Guanjun
Tai, Yonghang
Zhang, Jun
Security and Communication Networks, 2022, 2022
[7] Deep Neural Embedding for Software Vulnerability Discovery: Comparison and Optimization
Yuan, Xue
Lin, Guanjun
Tai, Yonghang
Zhang, Jun
SECURITY AND COMMUNICATION NETWORKS, 2022, 2022
[8] Cooking Up a Neural-based Model for Recipe Classification
Mohammadi, Elham
Naji, Nada
Marceau, Louis
Queudot, Marc
Charton, Eric
Kosseim, Leila
Meurs, Marie-Jean
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 5000 - 5009
[9] AGENT BASED VULNERABILITY DISCOVERY MODEL
Dobrovoljc, Andrej
SOR'13 PROCEEDINGS: THE 12TH INTERNATIONAL SYMPOSIUM ON OPERATIONAL RESEARCH IN SLOVENIA, 2013, : 379 - 384
[10] A Comprehensive Analysis of Deep Neural-Based Cerebral Microbleeds Detection System
Ferlin, Maria Anna
Grochowski, Michal
Kwasigroch, Arkadiusz
Mikolajczyk, Agnieszka
Szurowska, Edyta
Grzywinska, Malgorzata
Sabisz, Agnieszka
ELECTRONICS, 2021, 10 (18)

← 1 2 3 4 5 →