A Comparison of Source Code Representation Methods to Predict Vulnerability Inducing Code Changes

被引:0
|
作者
Halepmollasi, Rusen [1 ,2 ]
Hanifi, Khadija [3 ]
Fouladi, Ramin F. [3 ]
Tosun, Ayse [1 ]
机构
[1] Istanbul Tech Univ, Istanbul, Turkiye
[2] TUBITAK Informat & Informat Secur Res Ctr, Kocaeli, Turkiye
[3] Ericsson Secur Res, Istanbul, Turkiye
关键词
Software Vulnerabilities; Software Metrics; Embeddings; Abstract Syntax Tree;
D O I
10.5220/0011859300003464
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Vulnerability prediction is a data-driven process that utilizes previous vulnerability records and their associated fixes in software development projects. Vulnerability records are rarely observed compared to other defects, even in large projects, and are usually not directly linked to the related code changes in the bug tracking system. Thus, preparing a vulnerability dataset and building a predicting model is quite challenging. There exist many studies proposing software metrics-based or embedding/token-based approaches to predict software vulnerabilities over code changes. In this study, we aim to compare the performance of two different approaches in predicting code changes that induce vulnerabilities. While the first approach is based on an aggregation of software metrics, the second approach is based on embedding representation of the source code using an Abstract Syntax Tree and skip-gram techniques. We employed Deep Learning and popular Machine Learning algorithms to predict vulnerability-inducing code changes. We report our empirical analysis over code changes on the publicly available SmartSHARK dataset that we extended by adding real vulnerability data. Software metrics-based code representation method shows a better classification performance than embedding-based code representation method in terms of recall, precision and F1-Score.
引用
收藏
页码:469 / 478
页数:10
相关论文
共 50 条
  • [1] Source Code Vulnerability Detection Using Vulnerability Dependency Representation Graph
    Yang, Hongyu
    Yang, Haiyun
    Zhang, Liang
    Cheng, Xiang
    2022 IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, 2022, : 457 - 464
  • [2] Comparison of Source Code Storage Methods
    Pinter, Adam
    Szenasi, Sandor
    IEEE JOINT 19TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND INFORMATICS AND 7TH INTERNATIONAL CONFERENCE ON RECENT ACHIEVEMENTS IN MECHATRONICS, AUTOMATION, COMPUTER SCIENCES AND ROBOTICS (CINTI-MACRO 2019), 2019, : 231 - 236
  • [3] Summarizing source code with hierarchical code representation
    Zhou, Ziyi
    Yu, Huiqun
    Fan, Guisheng
    Huang, Zijie
    Yang, Xingguang
    INFORMATION AND SOFTWARE TECHNOLOGY, 2022, 143
  • [4] Towards Attention Based Vulnerability Discovery Using Source Code Representation
    Kim, Junae
    Hubczenko, David
    Montague, Paul
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: TEXT AND TIME SERIES, PT IV, 2019, 11730 : 731 - 746
  • [5] Automated Vulnerability Detection in Source Code Using Deep Representation Learning
    Russell, Rebecca L.
    Kim, Louis
    Hamilton, Lei H.
    Lazovich, Tomo
    Harer, Jacob A.
    Ozdemir, Onur
    Ellingwood, Paul M.
    McConley, Marc W.
    2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 757 - 762
  • [6] Automated Vulnerability Detection in Source Code Using Minimum Intermediate Representation Learning
    Li, Xin
    Wang, Lu
    Xin, Yang
    Yang, Yixian
    Chen, Yuling
    APPLIED SCIENCES-BASEL, 2020, 10 (05):
  • [7] On the Vulnerability of Large Corpora Source Code
    Barr, Joseph R.
    Thatcher, Tyler
    16TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2022), 2022, : 314 - 317
  • [8] Source code comparison
    Jones, B
    DR DOBBS JOURNAL, 1998, 23 (05): : 12 - 12
  • [9] A Vulnerability Detection System Based on Fusion of Assembly Code and Source Code
    Li, Xingzheng
    Feng, Bingwen
    Li, Guofeng
    Li, Tong
    He, Mingjin
    SECURITY AND COMMUNICATION NETWORKS, 2021, 2021
  • [10] CROP: Linking Code Reviews to Source Code Changes
    Paixao, Matheus
    Krinke, Jens
    Han, Donggyun
    Harman, Mark
    2018 IEEE/ACM 15TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR), 2018, : 46 - 49