Machine Learning to Combine Static Analysis Alerts with Software Metrics to Detect Security Vulnerabilities: An Empirical Study

被引:3
|
作者
Pereira, Jose D'Abruzzo [1 ]
Campos, Joao R. [1 ]
Vieira, Marco [1 ]
机构
[1] Univ Coimbra, CISUC, DEI, Coimbra, Portugal
关键词
Security; Vulnerability Detection; Static Code Analysis; Software Metrics; ANALYSIS TOOLS;
D O I
10.1109/EDCC53658.2021.00008
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Software developers can use diverse techniques and tools to reduce the number of vulnerabilities, but the effectiveness of existing solutions in real projects is questionable. For example, Static Analysis Tools (SATs) report potential vulnerabilities by analyzing code patterns, and Software Metrics (SMs) can be used to predict vulnerabilities based on high-level characteristics of the code. In theory, both approaches can be applied from the early stages of the development process, but it is well known that they fail to detect critical vulnerabilities and raise a large number of false alarms. This paper studies the hypothesis of using Machine Learning (ML) to combine alerts from SATs with SMs to predict vulnerabilities in a large software project (under development for many years). In practice, we use four ML algorithms, alerts from two SATs, and a large number of SMs to predict whether a source code file is vulnerable or not (binary classification) and to predict the vulnerability category (multiclass classification). Results show that one can achieve either high precision or high recall, but not both at the same time. To understand the reason, we analyze and compare snippets of source code, demonstrating that vulnerable and non-vulnerable files share similar characteristics, making it hard to distinguish vulnerable from non-vulnerable code based on SAT alerts and SMs.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 50 条
  • [31] An Empirical Analysis on Software Development Efforts Estimation in Machine Learning Perspective
    Rehman, Israr Ur
    Ali, Zulfiqar
    Jan, Zahoor
    ADCAIJ-ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL, 2021, 10 (03): : 227 - 240
  • [32] Micro-interaction Metrics Based Software Defect Prediction with Machine Learning, Immune Inspired and Evolutionary Classifiers: An Empirical Study
    Kaur, Arvinder
    Kaur, Kamadeep
    PROCEEDINGS OF FIRST INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY FOR INTELLIGENT SYSTEMS: VOL 1, 2016, 50 : 221 - 233
  • [33] An Empirical Study to Detect Cyberbullying with TF-IDF and Machine Learning Algorithms
    Rahman, Shagoto
    Talukder, Kamrul Hasan
    Mithila, Sabia Khatun
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATIONS AND INFORMATION TECHNOLOGY 2021 (ICECIT 2021), 2021,
  • [34] Software smell detection based on machine learning and its empirical study
    Yin, Yongfeng
    Su, Qingran
    Liu, Lijun
    SECOND TARGET RECOGNITION AND ARTIFICIAL INTELLIGENCE SUMMIT FORUM, 2020, 11427
  • [35] An empirical study of software reliability prediction using machine learning techniques
    Kumar, Pradeep
    Singh, Yogesh
    International Journal of System Assurance Engineering and Management, 2012, 3 (03) : 194 - 208
  • [36] On Combining Diverse Static Analysis Tools for Web Security: An Empirical Study
    Nunes, Paulo
    Medeiros, Iberia
    Fonseca, Jose
    Neves, Nuno
    Correia, Miguel
    Vieira, Marco
    2017 13TH EUROPEAN DEPENDABLE COMPUTING CONFERENCE (EDCC 2017), 2017, : 121 - 128
  • [37] A study on software metrics based software defect prediction using data mining and machine learning techniques
    Prasad, Manjula C.M.
    Florence, Lilly
    Arya, Arti
    International Journal of Database Theory and Application, 2015, 8 (03): : 179 - 190
  • [38] An Empirical Study on Security Knowledge Sharing and Learning in Open Source Software Communities
    Wen, Shao-Fang
    COMPUTERS, 2018, 7 (04)
  • [39] An empirical assessment of machine learning approaches for triaging reports of static analysis tools
    Sai Yerramreddy
    Austin Mordahl
    Ugur Koc
    Shiyi Wei
    Jeffrey S. Foster
    Marine Carpuat
    Adam A. Porter
    Empirical Software Engineering, 2023, 28
  • [40] An empirical assessment of machine learning approaches for triaging reports of static analysis tools
    Yerramreddy, Sai
    Mordahl, Austin
    Koc, Ugur
    Wei, Shiyi
    Foster, Jeffrey S.
    Carpuat, Marine
    Porter, Adam A.
    EMPIRICAL SOFTWARE ENGINEERING, 2023, 28 (02)