Machine Learning to Combine Static Analysis Alerts with Software Metrics to Detect Security Vulnerabilities: An Empirical Study

被引：3

作者：

Pereira, Jose D'Abruzzo ^{[1
]}

Campos, Joao R. ^{[1
]}

Vieira, Marco ^{[1
]}

机构：

[1] Univ Coimbra, CISUC, DEI, Coimbra, Portugal

来源：

2021 17TH EUROPEAN DEPENDABLE COMPUTING CONFERENCE (EDCC 2021) | 2021年

关键词：

Security; Vulnerability Detection; Static Code Analysis; Software Metrics; ANALYSIS TOOLS;

D O I：

10.1109/EDCC53658.2021.00008

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Software developers can use diverse techniques and tools to reduce the number of vulnerabilities, but the effectiveness of existing solutions in real projects is questionable. For example, Static Analysis Tools (SATs) report potential vulnerabilities by analyzing code patterns, and Software Metrics (SMs) can be used to predict vulnerabilities based on high-level characteristics of the code. In theory, both approaches can be applied from the early stages of the development process, but it is well known that they fail to detect critical vulnerabilities and raise a large number of false alarms. This paper studies the hypothesis of using Machine Learning (ML) to combine alerts from SATs with SMs to predict vulnerabilities in a large software project (under development for many years). In practice, we use four ML algorithms, alerts from two SATs, and a large number of SMs to predict whether a source code file is vulnerable or not (binary classification) and to predict the vulnerability category (multiclass classification). Results show that one can achieve either high precision or high recall, but not both at the same time. To understand the reason, we analyze and compare snippets of source code, demonstrating that vulnerable and non-vulnerable files share similar characteristics, making it hard to distinguish vulnerable from non-vulnerable code based on SAT alerts and SMs.

引用

页码：1 / 8

页数：8

共 50 条

[31] An Empirical Analysis on Software Development Efforts Estimation in Machine Learning Perspective
Rehman, Israr Ur
Ali, Zulfiqar
Jan, Zahoor
ADCAIJ-ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL, 2021, 10 (03): : 227 - 240
[32] Micro-interaction Metrics Based Software Defect Prediction with Machine Learning, Immune Inspired and Evolutionary Classifiers: An Empirical Study
Kaur, Arvinder
Kaur, Kamadeep
PROCEEDINGS OF FIRST INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY FOR INTELLIGENT SYSTEMS: VOL 1, 2016, 50 : 221 - 233
[33] An Empirical Study to Detect Cyberbullying with TF-IDF and Machine Learning Algorithms
Rahman, Shagoto
Talukder, Kamrul Hasan
Mithila, Sabia Khatun
PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATIONS AND INFORMATION TECHNOLOGY 2021 (ICECIT 2021), 2021,
[34] Software smell detection based on machine learning and its empirical study
Yin, Yongfeng
Su, Qingran
Liu, Lijun
SECOND TARGET RECOGNITION AND ARTIFICIAL INTELLIGENCE SUMMIT FORUM, 2020, 11427
[35] An empirical study of software reliability prediction using machine learning techniques
Kumar, Pradeep
Singh, Yogesh
International Journal of System Assurance Engineering and Management, 2012, 3 (03) : 194 - 208
[36] On Combining Diverse Static Analysis Tools for Web Security: An Empirical Study
Nunes, Paulo
Medeiros, Iberia
Fonseca, Jose
Neves, Nuno
Correia, Miguel
Vieira, Marco
2017 13TH EUROPEAN DEPENDABLE COMPUTING CONFERENCE (EDCC 2017), 2017, : 121 - 128
[37] A study on software metrics based software defect prediction using data mining and machine learning techniques
Prasad, Manjula C.M.
Florence, Lilly
Arya, Arti
International Journal of Database Theory and Application, 2015, 8 (03): : 179 - 190
[38] An Empirical Study on Security Knowledge Sharing and Learning in Open Source Software Communities
Wen, Shao-Fang
COMPUTERS, 2018, 7 (04)
[39] An empirical assessment of machine learning approaches for triaging reports of static analysis tools
Sai Yerramreddy
Austin Mordahl
Ugur Koc
Shiyi Wei
Jeffrey S. Foster
Marine Carpuat
Adam A. Porter
Empirical Software Engineering, 2023, 28
[40] An empirical assessment of machine learning approaches for triaging reports of static analysis tools
Yerramreddy, Sai
Mordahl, Austin
Koc, Ugur
Wei, Shiyi
Foster, Jeffrey S.
Carpuat, Marine
Porter, Adam A.
EMPIRICAL SOFTWARE ENGINEERING, 2023, 28 (02)

← 1 2 3 4 5 →