Deep neural-based vulnerability discovery demystified: data, model and performance

被引:14
|
作者
Lin, Guanjun [1 ]
Xiao, Wei [2 ]
Zhang, Leo Yu [3 ]
Gao, Shang [3 ]
Tai, Yonghang [4 ]
Zhang, Jun [5 ]
机构
[1] Sanming Univ, Sch Informat Engn, Sanming, Fujian, Peoples R China
[2] Changchun Univ Technol, Sch Comp Sci & Engn, Changchun, Jilin, Peoples R China
[3] Deakin Univ, Sch Informat Technol, Geelong, Vic 3216, Australia
[4] Yunnan Normal Univ, Yunnan Key Lab Optoelect Informat Technol, Kunming, Yunnan, Peoples R China
[5] Swinburne Univ Technol, Sch Software & Elect Engn, Melbourne, Vic 3122, Australia
来源
NEURAL COMPUTING & APPLICATIONS | 2021年 / 33卷 / 20期
关键词
Vulnerability discovery; Deep learning; Function-level; Baseline dataset; Performance evaluation;
D O I
10.1007/s00521-021-05954-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Detecting source-code level vulnerabilities at the development phase is a cost-effective solution to prevent potential attacks from happening at the software deployment stage. Many machine learning, including deep learning-based solutions, have been proposed to aid the process of vulnerability discovery. However, these approaches were mainly evaluated on self-constructed/-collected datasets. It is difficult to evaluate the effectiveness of proposed approaches due to lacking a unified baseline dataset. To bridge this gap, we construct a function-level vulnerability dataset from scratch, providing in source-code-label pairs. To evaluate the constructed dataset, a function-level vulnerability detection framework is built to incorporate six mainstream neural network models as vulnerability detectors. We perform experiments to investigate the performance behaviors of the neural model-based detectors using source code as raw input with continuous Bag-of-Words neural embeddings. Empirical results reveal that the variants of recurrent neural networks and convolutional neural network perform well on our dataset, as the former is capable of handling contextual information and the latter learns features from small context windows. In terms of generalization ability, the fully connected network outperforms the other network architectures. The performance evaluation can serve as a reference benchmark for neural model-based vulnerability detection at function-level granularity. Our dataset can serve as ground truth for ML-based function-level vulnerability detection and a baseline for evaluating relevant approaches.
引用
收藏
页码:13287 / 13300
页数:14
相关论文
共 50 条
  • [1] Deep neural-based vulnerability discovery demystified: data, model and performance
    Guanjun Lin
    Wei Xiao
    Leo Yu Zhang
    Shang Gao
    Yonghang Tai
    Jun Zhang
    Neural Computing and Applications, 2021, 33 : 13287 - 13300
  • [2] Model combination in neural-based forecasting
    Freitas, Paulo S. A.
    Rodrigues, Antnio J. L.
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2006, 173 (03) : 801 - 814
  • [3] Parallel Neural-based Hybrid Data Mining Ensemble
    Hassan, Syed Zahid
    Verma, Brijesh
    ISSNIP 2008: PROCEEDINGS OF THE 2008 INTERNATIONAL CONFERENCE ON INTELLIGENT SENSORS, SENSOR NETWORKS, AND INFORMATION PROCESSING, 2008, : 115 - 119
  • [4] Exploiting deep convolutional neural networks for a neural-based learning classifier system
    Kim, Ji-Yoon
    Cho, Sung-Bae
    NEUROCOMPUTING, 2019, 354 : 61 - 70
  • [5] Neural-Based Compression Scheme for Solar Image Data
    Zafari, Ali
    Khoshkhahtinat, Atefeh
    Grajeda, Jeremy A.
    Mehta, Piyush M.
    Nasrabadi, Nasser M.
    Boucheron, Laura E.
    Thompson, Barbara J.
    Kirk, Michael S. F.
    da Silva, Daniel E.
    IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2024, 60 (01) : 918 - 933
  • [6] Deep Neural Embedding for Software Vulnerability Discovery: Comparison and Optimization
    Yuan, Xue
    Lin, Guanjun
    Tai, Yonghang
    Zhang, Jun
    Security and Communication Networks, 2022, 2022
  • [7] Deep Neural Embedding for Software Vulnerability Discovery: Comparison and Optimization
    Yuan, Xue
    Lin, Guanjun
    Tai, Yonghang
    Zhang, Jun
    SECURITY AND COMMUNICATION NETWORKS, 2022, 2022
  • [8] Cooking Up a Neural-based Model for Recipe Classification
    Mohammadi, Elham
    Naji, Nada
    Marceau, Louis
    Queudot, Marc
    Charton, Eric
    Kosseim, Leila
    Meurs, Marie-Jean
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 5000 - 5009
  • [9] AGENT BASED VULNERABILITY DISCOVERY MODEL
    Dobrovoljc, Andrej
    SOR'13 PROCEEDINGS: THE 12TH INTERNATIONAL SYMPOSIUM ON OPERATIONAL RESEARCH IN SLOVENIA, 2013, : 379 - 384
  • [10] A Comprehensive Analysis of Deep Neural-Based Cerebral Microbleeds Detection System
    Ferlin, Maria Anna
    Grochowski, Michal
    Kwasigroch, Arkadiusz
    Mikolajczyk, Agnieszka
    Szurowska, Edyta
    Grzywinska, Malgorzata
    Sabisz, Agnieszka
    ELECTRONICS, 2021, 10 (18)