On the Vulnerability of Large Corpora Source Code

被引：0

作者：

Barr, Joseph R. ^{[1
]}

Thatcher, Tyler ^{[1
]}

机构：

[1] Acronis SCS, Scottsdale, AZ 85251 USA

来源：

16TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2022) | 2022年

关键词：

Source Code; Android; OpenSSL; Linuxm Recurrent Neural Networks; LSTM; Accuracy; Perplexity; Out-of-Vocabulary; Byte-Pair Encoding; Big Data; Unbalanced data; Synthetic Sampling;

D O I：

10.1109/ICSC52841.2022.00058

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper is a part of a continual effort to score functions in source code for vulnerability. For practical reasons we've restricted our attention to the C and C++ programming languages. We demonstrate an auto-encoder network and techniques to embed source code into a low-dimensional Euclidean space and some of the issues encountered where dealing with a very large code base. We also describe a process of developing `code smell' features and a classifier when data is extremely unbalanced. Finally we explore how the workflow may generalize to other projects and programming languages.

引用

页码：314 / 317

页数：4

共 50 条

[41] Python']Python source code vulnerability detection with named entity recognition
Ehrenberg, Melanie
Sarkani, Shahram
Mazzuchi, Thomas A.
COMPUTERS & SECURITY, 2024, 140
[42] Automatic Vulnerability Identification and Security Installation with Type Checking for Source Code
Hinatsu, Shun
Shimizu, Koichi
Ueda, Takeshi
Boyer, Benoit
Mentre, David
ADVANCES IN NETWORKED-BASED INFORMATION SYSTEMS, NBIS-2019, 2020, 1036 : 292 - 304
[43] Improving prompt tuning-based software vulnerability assessment by fusing source code and vulnerability description
Jiyu Wang
Xiang Chen
Wenlong Pei
Shaoyu Yang
Automated Software Engineering, 2025, 32 (2)
[44] Poster: Learning to Mine Parallel Natural Language/Source Code Corpora from Stack Overflow
Yin, Pengcheng
Deng, Bowen
Chen, Edgar
Vasilescu, Bogdan
Neubig, Graham
PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING - COMPANION (ICSE-COMPANION, 2018, : 388 - 389
[45] Towards Using Data-Influence Methods to Detect Noisy Samples in Source Code Corpora
Dau, Anh T. V.
Thang Nguyen-Duc
Hoang Thanh-Tung
Bui, Nghi D. Q.
PROCEEDINGS OF THE 37TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2022, 2022,
[46] Enhanced automated code vulnerability repair using large language models
de-Fitero-Dominguez, David
Garcia-Lopez, Eva
Garcia-Cabot, Antonio
Martinez-Herraiz, Jose-Javier
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 138
[47] Understanding Source Code Comments at Large-Scale
He, Hao
ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, : 1217 - 1219
[48] USING ADA SOURCE CODE GENERATORS IN A LARGE PROJECT
DUELL, R
SEBEL, HJ
DEWIT, FCA
LECTURE NOTES IN COMPUTER SCIENCE, 1992, 603 : 47 - 59
[49] Comprehending Source Code of Large Software System for Reuse
Kulkarni, Aniket
2016 IEEE 24TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC), 2016,
[50] Language Code Switching in Web Corpora
Benko, Vladimir
RASLAN 2017: RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING, 2017, : 97 - 105

← 1 2 3 4 5 →