On the Vulnerability of Large Corpora Source Code

被引:0
|
作者
Barr, Joseph R. [1 ]
Thatcher, Tyler [1 ]
机构
[1] Acronis SCS, Scottsdale, AZ 85251 USA
关键词
Source Code; Android; OpenSSL; Linuxm Recurrent Neural Networks; LSTM; Accuracy; Perplexity; Out-of-Vocabulary; Byte-Pair Encoding; Big Data; Unbalanced data; Synthetic Sampling;
D O I
10.1109/ICSC52841.2022.00058
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper is a part of a continual effort to score functions in source code for vulnerability. For practical reasons we've restricted our attention to the C and C++ programming languages. We demonstrate an auto-encoder network and techniques to embed source code into a low-dimensional Euclidean space and some of the issues encountered where dealing with a very large code base. We also describe a process of developing `code smell' features and a classifier when data is extremely unbalanced. Finally we explore how the workflow may generalize to other projects and programming languages.
引用
收藏
页码:314 / 317
页数:4
相关论文
共 50 条
  • [31] Source Code Generation For Large Scale Applications
    Altiparmak, Havva Cetiner
    Tokgoz, Busra
    Balcicek, Okkes Emin
    Ozkaya, Aslihan
    Arslan, Ahmet
    2013 INTERNATIONAL CONFERENCE ON TECHNOLOGICAL ADVANCES IN ELECTRICAL, ELECTRONICS AND COMPUTER ENGINEERING (TAEECE), 2013, : 404 - 410
  • [32] mVulSniffer: a multi-type source code vulnerability sniffer method
    Zhang X.
    Zhang F.
    Gai J.
    Du X.
    Zhou W.
    Cai T.
    Zhao B.
    Tongxin Xuebao/Journal on Communications, 2023, 44 (09): : 149 - 160
  • [33] Research and Progress on Learning-Based Source Code Vulnerability Detection
    Su X.-H.
    Zheng W.-N.
    Jiang Y.
    Wei H.-W.
    Wan J.-Y.
    Wei Z.-Y.
    Jisuanji Xuebao/Chinese Journal of Computers, 2024, 47 (02): : 337 - 374
  • [34] A Source Code Cross-site Scripting Vulnerability Detection Method
    Chen, Mu
    Chen, Lu
    Shao, Zhipeng
    Dai, Zaojian
    Li, Nige
    Huang, Xingjie
    Dang, Qian
    Zhao, Xinjian
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2023, 17 (06): : 1689 - 1705
  • [35] Incorporating Signal Awareness in Source Code Modeling: An Application to Vulnerability Detection
    Suneja, Sahil
    Zhuang, Yufan
    Zheng, Yunhui
    Laredo, Jim
    Morari, Alessandro
    Khurana, Udayan
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2023, 32 (06)
  • [36] Towards Attention Based Vulnerability Discovery Using Source Code Representation
    Kim, Junae
    Hubczenko, David
    Montague, Paul
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: TEXT AND TIME SERIES, PT IV, 2019, 11730 : 731 - 746
  • [37] VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection
    Hanif, Hazim
    Maffeis, Sergio
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [38] Automated Vulnerability Detection in Source Code Using Deep Representation Learning
    Russell, Rebecca L.
    Kim, Louis
    Hamilton, Lei H.
    Lazovich, Tomo
    Harer, Jacob A.
    Ozdemir, Onur
    Ellingwood, Paul M.
    McConley, Marc W.
    2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 757 - 762
  • [39] Method for Partial Recovering Source Code of Telecommunication Devices for Vulnerability Search
    Buinevich, Mikhail
    Izrailov, Konstantin
    Vladyko, Andrei
    2015 17TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT), 2015, : 76 - 80
  • [40] Machine Learning Techniques For Python']Python Source Code Vulnerability Detection
    Farasat, Talaya
    Posegga, Joachim
    PROCEEDINGS OF THE FOURTEENTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY, CODASPY 2024, 2024, : 151 - 153