A Comparison of Source Code Representation Methods to Predict Vulnerability Inducing Code Changes

被引:0
|
作者
Halepmollasi, Rusen [1 ,2 ]
Hanifi, Khadija [3 ]
Fouladi, Ramin F. [3 ]
Tosun, Ayse [1 ]
机构
[1] Istanbul Tech Univ, Istanbul, Turkiye
[2] TUBITAK Informat & Informat Secur Res Ctr, Kocaeli, Turkiye
[3] Ericsson Secur Res, Istanbul, Turkiye
关键词
Software Vulnerabilities; Software Metrics; Embeddings; Abstract Syntax Tree;
D O I
10.5220/0011859300003464
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Vulnerability prediction is a data-driven process that utilizes previous vulnerability records and their associated fixes in software development projects. Vulnerability records are rarely observed compared to other defects, even in large projects, and are usually not directly linked to the related code changes in the bug tracking system. Thus, preparing a vulnerability dataset and building a predicting model is quite challenging. There exist many studies proposing software metrics-based or embedding/token-based approaches to predict software vulnerabilities over code changes. In this study, we aim to compare the performance of two different approaches in predicting code changes that induce vulnerabilities. While the first approach is based on an aggregation of software metrics, the second approach is based on embedding representation of the source code using an Abstract Syntax Tree and skip-gram techniques. We employed Deep Learning and popular Machine Learning algorithms to predict vulnerability-inducing code changes. We report our empirical analysis over code changes on the publicly available SmartSHARK dataset that we extended by adding real vulnerability data. Software metrics-based code representation method shows a better classification performance than embedding-based code representation method in terms of recall, precision and F1-Score.
引用
收藏
页码:469 / 478
页数:10
相关论文
共 50 条
  • [31] Labelled Vulnerability Dataset on Android Source Code (LVDAndro) to Develop AI-Based Code Vulnerability Detection Models
    Senanayake, Janaka
    Kalutarage, Harsha
    Al-Kadri, Mhd Omar
    Piras, Luca
    Petrovski, Andrei
    PROCEEDINGS OF THE 20TH INTERNATIONAL CONFERENCE ON SECURITY AND CRYPTOGRAPHY, SECRYPT 2023, 2023, : 659 - 666
  • [32] A C/C plus plus Code Vulnerability Dataset with Code Changes and CVE Summaries
    Fan, Jiahao
    Li, Yi
    Wang, Shaohua
    Nguyen, Tien N.
    2020 IEEE/ACM 17TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2020, : 508 - 512
  • [33] Combining Holistic Source Code Representation with Siamese Neural Networks for Detecting Code Clones
    Patel, Smit
    Sinha, Roopak
    TESTING SOFTWARE AND SYSTEMS, ICTSS 2021, 2022, 13045 : 148 - 159
  • [34] Smart Contract Vulnerability Detection Using Code Representation Fusion
    Wang, Ben
    Chu, Hanting
    Zhang, Pengcheng
    Dong, Hai
    2021 28TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2021), 2021, : 564 - 565
  • [35] Vulnerability Localization Based On Intermediate Code Representation and Feature Fusion
    Zhu, Chenguang
    Wei, Renzheng
    Chen, Liwei
    Wu, Tongshuai
    Du, Gewangzi
    Shi, Gang
    COMPUTER JOURNAL, 2024, 67 (09): : 2749 - 2762
  • [36] An empirical evaluation of deep learning-based source code vulnerability detection: Representation versus models
    Semasaba, Abubakar Omari Abdallah
    Zheng, Wei
    Wu, Xiaoxue
    Agyemang, Samuel Akwasi
    Liu, Tao
    Ge, Yuan
    JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2023, 35 (11)
  • [37] Identifying Bug-Inducing Changes for Code Additions
    Sahal, Emre
    Tosun, Ayse
    PROCEEDINGS OF THE 12TH ACM/IEEE INTERNATIONAL SYMPOSIUM ON EMPIRICAL SOFTWARE ENGINEERING AND MEASUREMENT (ESEM 2018), 2018,
  • [38] Identifying Defect-Inducing Changes in Visual Code
    Eng, Kalvin
    Hindle, Abram
    Senchenko, Alexander
    2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION, ICSME, 2023, : 474 - 484
  • [39] Profiling Developers to Predict Vulnerable Code Changes
    Coskun, Tugce
    Halepmollasi, Rusen
    Hanifi, Khadija
    Fouladi, Ramin Fadaei
    De Cnudde, Pinar Comak
    Tosun, Ayse
    PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON PREDICTIVE MODELS AND DATA ANALYTICS IN SOFTWARE ENGINEERING, PROMISE 2022, 2022, : 32 - 41
  • [40] Mapping software design changes to source code changes
    Tan, Xiangchen
    Feng, Tie
    Zhang, Jiachen
    SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 2, PROCEEDINGS, 2007, : 650 - +