A study of learning likely data structure properties using machine learning models

被引:1
|
作者
Usman, Muhammad [1 ]
Wang, Wenxi [1 ]
Wang, Kaiyuan [1 ]
Yelen, Cagdas [1 ]
Dini, Nima [1 ]
Khurshid, Sarfraz [1 ]
机构
[1] Univ Texas Austin, Austin, TX 78712 USA
基金
美国国家科学基金会;
关键词
Data structure invariants; Machine learning; Korat; Learnability;
D O I
10.1007/s10009-020-00577-w
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Data structure properties are important for many testing and analysis tasks. For example, model checkers use these properties to find program faults. These properties are often written manually which can be error prone and lead to false alarms. This paper presents the results of controlled experiments performed using existing machine learning (ML) models on various data structures. These data structures are dynamic and reside on the program heap. We use ten data structure subjects and ten ML models to evaluate the learnability of data structure properties. The study reveals five key findings. One, most of the ML models perform well in learning data structure properties, but some of the ML models such as quadratic discriminant analysis and Gaussian naive Bayes are not suitable for learning data structure properties. Two, most of the ML models have high performance even when trained on just 1% of data samples. Three, certain data structure properties such as binary heap and red black tree are more learnable than others. Four, there are no significant differences between the learnability of varied-size (i.e., up to a certain size) and fixed-size data structures. Five, there can be significant differences in performance based on the encoding used. These findings show that using machine learning models to learn data structure properties is very promising. We believe that these properties, once learned, can be used to provide a run-time check to see whether a program state at a particular point satisfies the learned property. Learned properties can also be employed in the future to automate static and dynamic analysis, which would enhance software testing and verification techniques.
引用
收藏
页码:601 / 615
页数:15
相关论文
共 50 条
  • [1] A study of learning likely data structure properties using machine learning models
    Muhammad Usman
    Wenxi Wang
    Kaiyuan Wang
    Cagdas Yelen
    Nima Dini
    Sarfraz Khurshid
    International Journal on Software Tools for Technology Transfer, 2020, 22 : 601 - 615
  • [3] Analytics of Epidemiological Data using Machine Learning Models
    Barapatre, Harshita
    Jangir, Jatin
    Bajpai, Sudhanshu
    Chawla, Bhavesh
    Keswani, Gunjan
    INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2023, 14 (01): : 255 - 262
  • [4] Using Visualization to Illustrate Machine Learning Models for Genomic Data
    Qu, Zhonglin
    Zhou, Yi
    Quang Vinh Nguyen
    Catchpoole, Daniel R.
    PROCEEDINGS OF THE AUSTRALASIAN COMPUTER SCIENCE WEEK MULTICONFERENCE (ACSW 2019), 2019,
  • [5] Classification of a-thalassemia data using machine learning models
    Christensen, Frederik
    Kilic, Deniz Kenan
    Nielsen, Izabela Ewa
    El-Galaly, Tarec Christoffer
    Glenthoj, Andreas
    Helby, Jens
    Frederiksen, Henrik
    Moller, Soren
    Fuglkjaer, Alexander Djupnes
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2025, 260
  • [6] An evaluation of machine learning and deep learning models for drought prediction using weather data
    Jiang, Weiwei
    Luo, Jiayun
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (03) : 3611 - 3626
  • [7] Explicable Machine Learning Models Using Rich Geospatial Data
    Bramson, Aaron
    Mita, Masayoshi
    2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024, 2024, : 2381 - 2386
  • [8] Sentiment Analysis of Financial Textual data Using Machine Learning and Deep Learning Models
    Ahmad H.O.
    Umar S.U.
    Informatica (Slovenia), 2023, 47 (05): : 153 - 158
  • [9] Development of risk models of incident hypertension using machine learning on the HUNT study data
    Filip Emil Schjerven
    Emma Maria Lovisa Ingeström
    Ingelin Steinsland
    Frank Lindseth
    Scientific Reports, 14
  • [10] Development of risk models of incident hypertension using machine learning on the HUNT study data
    Schjerven, Filip Emil
    Ingestrom, Emma Maria Lovisa
    Steinsland, Ingelin
    Lindseth, Frank
    SCIENTIFIC REPORTS, 2024, 14 (01)