A study of learning likely data structure properties using machine learning models

被引：1

作者：

Usman, Muhammad ^{[1
]}

Wang, Wenxi ^{[1
]}

Wang, Kaiyuan ^{[1
]}

Yelen, Cagdas ^{[1
]}

Dini, Nima ^{[1
]}

Khurshid, Sarfraz ^{[1
]}

机构：

[1] Univ Texas Austin, Austin, TX 78712 USA

来源：

INTERNATIONAL JOURNAL ON SOFTWARE TOOLS FOR TECHNOLOGY TRANSFER | 2020年 / 22卷 / 05期

基金：

美国国家科学基金会;

关键词：

Data structure invariants; Machine learning; Korat; Learnability;

D O I：

10.1007/s10009-020-00577-w

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Data structure properties are important for many testing and analysis tasks. For example, model checkers use these properties to find program faults. These properties are often written manually which can be error prone and lead to false alarms. This paper presents the results of controlled experiments performed using existing machine learning (ML) models on various data structures. These data structures are dynamic and reside on the program heap. We use ten data structure subjects and ten ML models to evaluate the learnability of data structure properties. The study reveals five key findings. One, most of the ML models perform well in learning data structure properties, but some of the ML models such as quadratic discriminant analysis and Gaussian naive Bayes are not suitable for learning data structure properties. Two, most of the ML models have high performance even when trained on just 1% of data samples. Three, certain data structure properties such as binary heap and red black tree are more learnable than others. Four, there are no significant differences between the learnability of varied-size (i.e., up to a certain size) and fixed-size data structures. Five, there can be significant differences in performance based on the encoding used. These findings show that using machine learning models to learn data structure properties is very promising. We believe that these properties, once learned, can be used to provide a run-time check to see whether a program state at a particular point satisfies the learned property. Learned properties can also be employed in the future to automate static and dynamic analysis, which would enhance software testing and verification techniques.

引用

页码：601 / 615

页数：15

共 50 条

[1] A study of learning likely data structure properties using machine learning models
Muhammad Usman
Wenxi Wang
Kaiyuan Wang
Cagdas Yelen
Nima Dini
Sarfraz Khurshid
International Journal on Software Tools for Technology Transfer, 2020, 22 : 601 - 615
[2] Machine learning models identify likely ICSRs
Reactions Weekly, 2018, 1690 (1) : 9 - 9
[3] Analytics of Epidemiological Data using Machine Learning Models
Barapatre, Harshita
Jangir, Jatin
Bajpai, Sudhanshu
Chawla, Bhavesh
Keswani, Gunjan
INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2023, 14 (01): : 255 - 262
[4] Using Visualization to Illustrate Machine Learning Models for Genomic Data
Qu, Zhonglin
Zhou, Yi
Quang Vinh Nguyen
Catchpoole, Daniel R.
PROCEEDINGS OF THE AUSTRALASIAN COMPUTER SCIENCE WEEK MULTICONFERENCE (ACSW 2019), 2019,
[5] Classification of a-thalassemia data using machine learning models
Christensen, Frederik
Kilic, Deniz Kenan
Nielsen, Izabela Ewa
El-Galaly, Tarec Christoffer
Glenthoj, Andreas
Helby, Jens
Frederiksen, Henrik
Moller, Soren
Fuglkjaer, Alexander Djupnes
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2025, 260
[6] An evaluation of machine learning and deep learning models for drought prediction using weather data
Jiang, Weiwei
Luo, Jiayun
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (03) : 3611 - 3626
[7] Explicable Machine Learning Models Using Rich Geospatial Data
Bramson, Aaron
Mita, Masayoshi
2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024, 2024, : 2381 - 2386
[8] Sentiment Analysis of Financial Textual data Using Machine Learning and Deep Learning Models
Ahmad H.O.
Umar S.U.
Informatica (Slovenia), 2023, 47 (05): : 153 - 158
[9] Development of risk models of incident hypertension using machine learning on the HUNT study data
Filip Emil Schjerven
Emma Maria Lovisa Ingeström
Ingelin Steinsland
Frank Lindseth
Scientific Reports, 14
[10] Development of risk models of incident hypertension using machine learning on the HUNT study data
Schjerven, Filip Emil
Ingestrom, Emma Maria Lovisa
Steinsland, Ingelin
Lindseth, Frank
SCIENTIFIC REPORTS, 2024, 14 (01)

← 1 2 3 4 5 →