On the Vulnerability of Large Corpora Source Code

被引:0
|
作者
Barr, Joseph R. [1 ]
Thatcher, Tyler [1 ]
机构
[1] Acronis SCS, Scottsdale, AZ 85251 USA
关键词
Source Code; Android; OpenSSL; Linuxm Recurrent Neural Networks; LSTM; Accuracy; Perplexity; Out-of-Vocabulary; Byte-Pair Encoding; Big Data; Unbalanced data; Synthetic Sampling;
D O I
10.1109/ICSC52841.2022.00058
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper is a part of a continual effort to score functions in source code for vulnerability. For practical reasons we've restricted our attention to the C and C++ programming languages. We demonstrate an auto-encoder network and techniques to embed source code into a low-dimensional Euclidean space and some of the issues encountered where dealing with a very large code base. We also describe a process of developing `code smell' features and a classifier when data is extremely unbalanced. Finally we explore how the workflow may generalize to other projects and programming languages.
引用
收藏
页码:314 / 317
页数:4
相关论文
共 50 条
  • [1] Measuring Code Similarity in Large-scaled Code Corpora
    Ragkhitwetsagul, Chaiyong
    32ND IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2016), 2016, : 626 - 630
  • [2] Source Code Vulnerability Detection Using Vulnerability Dependency Representation Graph
    Yang, Hongyu
    Yang, Haiyun
    Zhang, Liang
    Cheng, Xiang
    2022 IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, 2022, : 457 - 464
  • [3] A Vulnerability Detection System Based on Fusion of Assembly Code and Source Code
    Li, Xingzheng
    Feng, Bingwen
    Li, Guofeng
    Li, Tong
    He, Mingjin
    SECURITY AND COMMUNICATION NETWORKS, 2021, 2021
  • [4] A security vulnerability predictor based on source code metrics
    Pakshad, Puya
    Shameli-Sendi, Alireza
    Abbasi, Behzad Khalaji Emamzadeh
    JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2023, 19 (04) : 615 - 633
  • [5] Vulnerability Detection for Source Code Using Contextual LSTM
    Xu, Aidong
    Dai, Tao
    Chen, Huajun
    Ming, Zhe
    Li, Weining
    2018 5TH INTERNATIONAL CONFERENCE ON SYSTEMS AND INFORMATICS (ICSAI), 2018, : 1225 - 1230
  • [6] The Vulnerability Testing Method and Management for Software Source Code
    Min, Li
    Sen, Jing
    Bin, Dong
    Wei, Chen
    2022 IEEE 6TH ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2022, : 68 - 71
  • [7] A security vulnerability predictor based on source code metrics
    Puya Pakshad
    Alireza Shameli-Sendi
    Behzad Khalaji Emamzadeh Abbasi
    Journal of Computer Virology and Hacking Techniques, 2023, 19 : 615 - 633
  • [8] Information-theoretic Source Code Vulnerability Highlighting
    Nguyen, Van
    Le, Trung
    De Vel, Olivier
    Montague, Paul
    Grundy, John
    Phung, Dinh
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [9] A Comparison of Source Code Representation Methods to Predict Vulnerability Inducing Code Changes
    Halepmollasi, Rusen
    Hanifi, Khadija
    Fouladi, Ramin F.
    Tosun, Ayse
    PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON EVALUATION OF NOVEL APPROACHES TO SOFTWARE ENGINEERING, ENASE 2023, 2023, : 469 - 478
  • [10] Vulnerability detection tool in source code by building and leveraging semantic code graph
    Delaitre, Sabine
    Pulgar Gutierrez, Jose Maria
    19TH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY, AND SECURITY, ARES 2024, 2024,