Machine Learning to Combine Static Analysis Alerts with Software Metrics to Detect Security Vulnerabilities: An Empirical Study

被引：3

作者：

Pereira, Jose D'Abruzzo ^{[1
]}

Campos, Joao R. ^{[1
]}

Vieira, Marco ^{[1
]}

机构：

[1] Univ Coimbra, CISUC, DEI, Coimbra, Portugal

来源：

2021 17TH EUROPEAN DEPENDABLE COMPUTING CONFERENCE (EDCC 2021) | 2021年

关键词：

Security; Vulnerability Detection; Static Code Analysis; Software Metrics; ANALYSIS TOOLS;

D O I：

10.1109/EDCC53658.2021.00008

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Software developers can use diverse techniques and tools to reduce the number of vulnerabilities, but the effectiveness of existing solutions in real projects is questionable. For example, Static Analysis Tools (SATs) report potential vulnerabilities by analyzing code patterns, and Software Metrics (SMs) can be used to predict vulnerabilities based on high-level characteristics of the code. In theory, both approaches can be applied from the early stages of the development process, but it is well known that they fail to detect critical vulnerabilities and raise a large number of false alarms. This paper studies the hypothesis of using Machine Learning (ML) to combine alerts from SATs with SMs to predict vulnerabilities in a large software project (under development for many years). In practice, we use four ML algorithms, alerts from two SATs, and a large number of SMs to predict whether a source code file is vulnerable or not (binary classification) and to predict the vulnerability category (multiclass classification). Results show that one can achieve either high precision or high recall, but not both at the same time. To understand the reason, we analyze and compare snippets of source code, demonstrating that vulnerable and non-vulnerable files share similar characteristics, making it hard to distinguish vulnerable from non-vulnerable code based on SAT alerts and SMs.

引用

页码：1 / 8

页数：8

共 50 条

[21] Software reuse cuts both ways: An empirical analysis of its relationship with security vulnerabilities
Gkortzis, Antonios
Feitosa, Daniel
Spinellis, Diomidis
JOURNAL OF SYSTEMS AND SOFTWARE, 2021, 172
[22] LAPSE plus Static Analysis Security Software: Vulnerabilities Detection in Java']Java EE Applications
Martin Perez, Pablo
Filipiak, Joanna
Maria Sierra, Jose
FUTURE INFORMATION TECHNOLOGY, PT 1, 2011, 184 : 148 - 156
[23] Detecting Android Security Vulnerabilities Using Machine Learning and System Calls Analysis
Campos, Carlos Renato Salim
Jaafar, Fehmi
Malik, Yasir
2019 COMPANION OF THE 19TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS-C 2019), 2019, : 109 - 113
[24] Discovering software vulnerabilities using data-flow analysis and machine learning
Kronjee, Jorrit
Hommersom, Arjen
Vranken, Harald
13TH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY AND SECURITY (ARES 2018), 2019,
[25] Software defect prediction: A study on software metrics using statistical and machine learning methods
Canaparo, Marco
Ronchierr, Elisabetta
Bertaccini, Gianluca
INTERNATIONAL SYMPOSIUM ON GRIDS & CLOUDS 2022, 2022,
[26] How Do Developers Act on Static Analysis Alerts? An Empirical Study of Coverity Usage
Imtiaz, Nasif
Murphy, Brendan
Williams, Laurie
2019 IEEE 30TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE), 2019, : 323 - 333
[27] Applying machine learning to predict software fault proneness using change metrics, static code metrics, and a combination of them
Alshehri, Yasser Ali
Goseva-Popstojanova, Katerina
Dzielski, Dale G.
Devine, Thomas
IEEE SOUTHEASTCON 2018, 2018,
[28] Architectural Security Weaknesses in Industrial Control Systems (ICS) An Empirical Study based on Disclosed Software Vulnerabilities
Gonzalez, Danielle
Alhenaki, Fawaz
Mirakhorli, Mehdi
2019 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ARCHITECTURE (ICSA), 2019, : 31 - 40
[29] Applying Software Design Metrics to Developer Story: A Supervised Machine Learning Analysis
Algarni, Asaad
Magel, Kenneth
2019 IEEE FIRST INTERNATIONAL CONFERENCE ON COGNITIVE MACHINE INTELLIGENCE (COGMI 2019), 2019, : 156 - 159
[30] Empirical Analysis of Hidden Technical Debt Patterns in Machine Learning Software
Alahdab, Mohannad
Calikli, Gul
PRODUCT-FOCUSED SOFTWARE PROCESS IMPROVEMENT, PROFES 2019, 2019, 11915 : 195 - 202

← 1 2 3 4 5 →