A study of learning likely data structure properties using machine learning models

被引：1

作者：

Usman, Muhammad ^{[1
]}

Wang, Wenxi ^{[1
]}

Wang, Kaiyuan ^{[1
]}

Yelen, Cagdas ^{[1
]}

Dini, Nima ^{[1
]}

Khurshid, Sarfraz ^{[1
]}

机构：

[1] Univ Texas Austin, Austin, TX 78712 USA

来源：

INTERNATIONAL JOURNAL ON SOFTWARE TOOLS FOR TECHNOLOGY TRANSFER | 2020年 / 22卷 / 05期

基金：

美国国家科学基金会;

关键词：

Data structure invariants; Machine learning; Korat; Learnability;

D O I：

10.1007/s10009-020-00577-w

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Data structure properties are important for many testing and analysis tasks. For example, model checkers use these properties to find program faults. These properties are often written manually which can be error prone and lead to false alarms. This paper presents the results of controlled experiments performed using existing machine learning (ML) models on various data structures. These data structures are dynamic and reside on the program heap. We use ten data structure subjects and ten ML models to evaluate the learnability of data structure properties. The study reveals five key findings. One, most of the ML models perform well in learning data structure properties, but some of the ML models such as quadratic discriminant analysis and Gaussian naive Bayes are not suitable for learning data structure properties. Two, most of the ML models have high performance even when trained on just 1% of data samples. Three, certain data structure properties such as binary heap and red black tree are more learnable than others. Four, there are no significant differences between the learnability of varied-size (i.e., up to a certain size) and fixed-size data structures. Five, there can be significant differences in performance based on the encoding used. These findings show that using machine learning models to learn data structure properties is very promising. We believe that these properties, once learned, can be used to provide a run-time check to see whether a program state at a particular point satisfies the learned property. Learned properties can also be employed in the future to automate static and dynamic analysis, which would enhance software testing and verification techniques.

引用

页码：601 / 615

页数：15

共 50 条

[21] Using favorite data to analyze asymmetric competition: Machine learning models
Liu, Yezheng
Qian, Yang
Jiang, Yuanchun
Shang, Jennifer
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2020, 287 (02) : 600 - 615
[22] Gene expression data classification using topology and machine learning models
Dey, Tamal K.
Mandal, Sayan
Mukherjee, Soham
BMC BIOINFORMATICS, 2022, 22 (SUPPL 10)
[23] Absenteeism Prediction: A Comparative Study Using Machine Learning Models
Dogruyol, Kagan
Sekeroglu, Boran
10TH INTERNATIONAL CONFERENCE ON THEORY AND APPLICATION OF SOFT COMPUTING, COMPUTING WITH WORDS AND PERCEPTIONS - ICSCCW-2019, 2020, 1095 : 728 - 734
[24] Protein structure prediction (RMSD ≤ 5 Å) using machine learning models
Pathak, Yadunath
Rana, Prashant Singh
Singh, P. K.
Saraswat, Mukesh
INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2016, 14 (01) : 71 - 85
[25] Data Acquisition for Improving Machine Learning Models
Li, Yifan
Yu, Xiaohui
Koudas, Nick
PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 14 (10): : 1832 - 1844
[26] A Comparative Study of Machine Learning Classification Models on Customer Behavior Data
Rusli, Nur Ida Aniza
Zulkifle, Farizuwana Akma
Ramli, Intan Syaherra
SOFT COMPUTING IN DATA SCIENCE, SCDS 2023, 2023, 1771 : 222 - 231
[27] Applying Machine Learning Methods and Models to Explore the Structure of Traffic Accident Data
Sysoev, Anton
Klyavin, Vladimir
Dvurechenskaya, Alexandra
Mamedov, Albert
Shushunov, Vladislav
COMPUTATION, 2022, 10 (04)
[28] Damage Detection with Data-Driven Machine Learning Models on an Experimental Structure
Alemu, Yohannes L.
Lahmer, Tom
Walther, Christian
ENG, 2024, 5 (02): : 629 - 656
[29] Learning EPON delay models from data: a machine learning approach
Alberto Hernandez, Jose
Ebrahimzadeh, Amin
Maier, Martin
Larrabeiti, David
JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING, 2021, 13 (12) : 322 - 330
[30] A Comparative Study of Shallow Machine Learning Models and Deep Learning Models for Landslide Susceptibility Assessment Based on Imbalanced Data
Xu, Shiluo
Song, Yingxu
Hao, Xiulan
FORESTS, 2022, 13 (11):

← 1 2 3 4 5 →