A Comparison of Source Code Representation Methods to Predict Vulnerability Inducing Code Changes

被引:0
|
作者
Halepmollasi, Rusen [1 ,2 ]
Hanifi, Khadija [3 ]
Fouladi, Ramin F. [3 ]
Tosun, Ayse [1 ]
机构
[1] Istanbul Tech Univ, Istanbul, Turkiye
[2] TUBITAK Informat & Informat Secur Res Ctr, Kocaeli, Turkiye
[3] Ericsson Secur Res, Istanbul, Turkiye
关键词
Software Vulnerabilities; Software Metrics; Embeddings; Abstract Syntax Tree;
D O I
10.5220/0011859300003464
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Vulnerability prediction is a data-driven process that utilizes previous vulnerability records and their associated fixes in software development projects. Vulnerability records are rarely observed compared to other defects, even in large projects, and are usually not directly linked to the related code changes in the bug tracking system. Thus, preparing a vulnerability dataset and building a predicting model is quite challenging. There exist many studies proposing software metrics-based or embedding/token-based approaches to predict software vulnerabilities over code changes. In this study, we aim to compare the performance of two different approaches in predicting code changes that induce vulnerabilities. While the first approach is based on an aggregation of software metrics, the second approach is based on embedding representation of the source code using an Abstract Syntax Tree and skip-gram techniques. We employed Deep Learning and popular Machine Learning algorithms to predict vulnerability-inducing code changes. We report our empirical analysis over code changes on the publicly available SmartSHARK dataset that we extended by adding real vulnerability data. Software metrics-based code representation method shows a better classification performance than embedding-based code representation method in terms of recall, precision and F1-Score.
引用
收藏
页码:469 / 478
页数:10
相关论文
共 50 条
  • [41] An extensible tool for source code representation using XML
    McArthur, G
    Mylopoulos, J
    Ng, SKK
    NINTH WORKING CONFERENCE ON REVERSE ENGINEERING, PROCEEDINGS, 2002, : 199 - 208
  • [42] Modular Tree Network for Source Code Representation Learning
    Wang, Wenhan
    Li, Ge
    Shen, Sijie
    Xia, Xin
    Jin, Zhi
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2020, 29 (04)
  • [43] HELoC: Hierarchical Contrastive Learning of Source Code Representation
    Wang, Xiao
    Wu, Qiong
    Zhang, Hongyu
    Lyu, Chen
    Jiang, Xue
    Zheng, Zhuoran
    Lyu, Lei
    Hu, Songlin
    30TH IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2022), 2022, : 354 - 365
  • [44] An Unbiased Transformer Source Code Learning with Semantic Vulnerability Graph
    Islam, Nafis Tanveer
    Parra, Gonzalo De La Torre
    Manuel, Dylan
    Bou-Harb, Elias
    Najafirad, Peyman
    2023 IEEE 8TH EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY, EUROS&P, 2023, : 144 - 159
  • [45] VulChecker: Graph-based Vulnerability Localization in Source Code
    Mirsky, Yisroel
    Macon, George
    Brown, Michael
    Yagemann, Carter
    Pruett, Matthew
    Downing, Evan
    Mertoguno, Sukarno
    Lee, Andwenke
    PROCEEDINGS OF THE 32ND USENIX SECURITY SYMPOSIUM, 2023, : 6557 - 6574
  • [46] Source Code and Binary Level Vulnerability Detection and Hot Patching
    Xu, Zhengzi
    2020 35TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2020), 2020, : 1397 - 1399
  • [47] A Privacy-Preserving Source Code Vulnerability Detection Method
    Zhao, Dongdong
    Yu, Zizhuo
    Zhou, Jing
    Xiang, Jianwen
    PATTERN RECOGNITION AND COMPUTER VISION, PT III, PRCV 2024, 2025, 15033 : 438 - 452
  • [48] FORMAL METHODS AND SOURCE CODE - A CONFLICT - REPLY
    WEISER, M
    COMPUTER, 1988, 21 (04) : 11 - 11
  • [49] Machine-Learning Supported Vulnerability Detection in Source Code
    Sonnekalb, Tim
    ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, : 1180 - 1183
  • [50] Vulnerability Prediction From Source Code Using Machine Learning
    Bilgin, Zeki
    Ersoy, Mehmet Akif
    Soykan, Elif Ustundag
    Tomur, Emrah
    Comak, Pinar
    Karacay, Leyli
    IEEE ACCESS, 2020, 8 : 150672 - 150684