On the Vulnerability of Large Corpora Source Code

被引:0
|
作者
Barr, Joseph R. [1 ]
Thatcher, Tyler [1 ]
机构
[1] Acronis SCS, Scottsdale, AZ 85251 USA
关键词
Source Code; Android; OpenSSL; Linuxm Recurrent Neural Networks; LSTM; Accuracy; Perplexity; Out-of-Vocabulary; Byte-Pair Encoding; Big Data; Unbalanced data; Synthetic Sampling;
D O I
10.1109/ICSC52841.2022.00058
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper is a part of a continual effort to score functions in source code for vulnerability. For practical reasons we've restricted our attention to the C and C++ programming languages. We demonstrate an auto-encoder network and techniques to embed source code into a low-dimensional Euclidean space and some of the issues encountered where dealing with a very large code base. We also describe a process of developing `code smell' features and a classifier when data is extremely unbalanced. Finally we explore how the workflow may generalize to other projects and programming languages.
引用
收藏
页码:314 / 317
页数:4
相关论文
共 50 条
  • [41] Python']Python source code vulnerability detection with named entity recognition
    Ehrenberg, Melanie
    Sarkani, Shahram
    Mazzuchi, Thomas A.
    COMPUTERS & SECURITY, 2024, 140
  • [42] Automatic Vulnerability Identification and Security Installation with Type Checking for Source Code
    Hinatsu, Shun
    Shimizu, Koichi
    Ueda, Takeshi
    Boyer, Benoit
    Mentre, David
    ADVANCES IN NETWORKED-BASED INFORMATION SYSTEMS, NBIS-2019, 2020, 1036 : 292 - 304
  • [43] Improving prompt tuning-based software vulnerability assessment by fusing source code and vulnerability description
    Jiyu Wang
    Xiang Chen
    Wenlong Pei
    Shaoyu Yang
    Automated Software Engineering, 2025, 32 (2)
  • [44] Poster: Learning to Mine Parallel Natural Language/Source Code Corpora from Stack Overflow
    Yin, Pengcheng
    Deng, Bowen
    Chen, Edgar
    Vasilescu, Bogdan
    Neubig, Graham
    PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING - COMPANION (ICSE-COMPANION, 2018, : 388 - 389
  • [45] Towards Using Data-Influence Methods to Detect Noisy Samples in Source Code Corpora
    Dau, Anh T. V.
    Thang Nguyen-Duc
    Hoang Thanh-Tung
    Bui, Nghi D. Q.
    PROCEEDINGS OF THE 37TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2022, 2022,
  • [46] Enhanced automated code vulnerability repair using large language models
    de-Fitero-Dominguez, David
    Garcia-Lopez, Eva
    Garcia-Cabot, Antonio
    Martinez-Herraiz, Jose-Javier
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 138
  • [47] Understanding Source Code Comments at Large-Scale
    He, Hao
    ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, : 1217 - 1219
  • [48] USING ADA SOURCE CODE GENERATORS IN A LARGE PROJECT
    DUELL, R
    SEBEL, HJ
    DEWIT, FCA
    LECTURE NOTES IN COMPUTER SCIENCE, 1992, 603 : 47 - 59
  • [49] Comprehending Source Code of Large Software System for Reuse
    Kulkarni, Aniket
    2016 IEEE 24TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC), 2016,
  • [50] Language Code Switching in Web Corpora
    Benko, Vladimir
    RASLAN 2017: RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING, 2017, : 97 - 105