A study of learning likely data structure properties using machine learning models

被引:1
|
作者
Usman, Muhammad [1 ]
Wang, Wenxi [1 ]
Wang, Kaiyuan [1 ]
Yelen, Cagdas [1 ]
Dini, Nima [1 ]
Khurshid, Sarfraz [1 ]
机构
[1] Univ Texas Austin, Austin, TX 78712 USA
基金
美国国家科学基金会;
关键词
Data structure invariants; Machine learning; Korat; Learnability;
D O I
10.1007/s10009-020-00577-w
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Data structure properties are important for many testing and analysis tasks. For example, model checkers use these properties to find program faults. These properties are often written manually which can be error prone and lead to false alarms. This paper presents the results of controlled experiments performed using existing machine learning (ML) models on various data structures. These data structures are dynamic and reside on the program heap. We use ten data structure subjects and ten ML models to evaluate the learnability of data structure properties. The study reveals five key findings. One, most of the ML models perform well in learning data structure properties, but some of the ML models such as quadratic discriminant analysis and Gaussian naive Bayes are not suitable for learning data structure properties. Two, most of the ML models have high performance even when trained on just 1% of data samples. Three, certain data structure properties such as binary heap and red black tree are more learnable than others. Four, there are no significant differences between the learnability of varied-size (i.e., up to a certain size) and fixed-size data structures. Five, there can be significant differences in performance based on the encoding used. These findings show that using machine learning models to learn data structure properties is very promising. We believe that these properties, once learned, can be used to provide a run-time check to see whether a program state at a particular point satisfies the learned property. Learned properties can also be employed in the future to automate static and dynamic analysis, which would enhance software testing and verification techniques.
引用
收藏
页码:601 / 615
页数:15
相关论文
共 50 条
  • [21] Using favorite data to analyze asymmetric competition: Machine learning models
    Liu, Yezheng
    Qian, Yang
    Jiang, Yuanchun
    Shang, Jennifer
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2020, 287 (02) : 600 - 615
  • [22] Gene expression data classification using topology and machine learning models
    Dey, Tamal K.
    Mandal, Sayan
    Mukherjee, Soham
    BMC BIOINFORMATICS, 2022, 22 (SUPPL 10)
  • [23] Absenteeism Prediction: A Comparative Study Using Machine Learning Models
    Dogruyol, Kagan
    Sekeroglu, Boran
    10TH INTERNATIONAL CONFERENCE ON THEORY AND APPLICATION OF SOFT COMPUTING, COMPUTING WITH WORDS AND PERCEPTIONS - ICSCCW-2019, 2020, 1095 : 728 - 734
  • [24] Protein structure prediction (RMSD ≤ 5 Å) using machine learning models
    Pathak, Yadunath
    Rana, Prashant Singh
    Singh, P. K.
    Saraswat, Mukesh
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2016, 14 (01) : 71 - 85
  • [25] Data Acquisition for Improving Machine Learning Models
    Li, Yifan
    Yu, Xiaohui
    Koudas, Nick
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 14 (10): : 1832 - 1844
  • [26] A Comparative Study of Machine Learning Classification Models on Customer Behavior Data
    Rusli, Nur Ida Aniza
    Zulkifle, Farizuwana Akma
    Ramli, Intan Syaherra
    SOFT COMPUTING IN DATA SCIENCE, SCDS 2023, 2023, 1771 : 222 - 231
  • [27] Applying Machine Learning Methods and Models to Explore the Structure of Traffic Accident Data
    Sysoev, Anton
    Klyavin, Vladimir
    Dvurechenskaya, Alexandra
    Mamedov, Albert
    Shushunov, Vladislav
    COMPUTATION, 2022, 10 (04)
  • [28] Damage Detection with Data-Driven Machine Learning Models on an Experimental Structure
    Alemu, Yohannes L.
    Lahmer, Tom
    Walther, Christian
    ENG, 2024, 5 (02): : 629 - 656
  • [29] Learning EPON delay models from data: a machine learning approach
    Alberto Hernandez, Jose
    Ebrahimzadeh, Amin
    Maier, Martin
    Larrabeiti, David
    JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING, 2021, 13 (12) : 322 - 330
  • [30] A Comparative Study of Shallow Machine Learning Models and Deep Learning Models for Landslide Susceptibility Assessment Based on Imbalanced Data
    Xu, Shiluo
    Song, Yingxu
    Hao, Xiulan
    FORESTS, 2022, 13 (11):