Lessons on Datasets and Paradigms in Machine Learning for Symbolic Computation: A Case Study on CAD

被引:0
|
作者
del Rio, Tereso [1 ]
England, Matthew [1 ]
机构
[1] Coventry Univ, Coventry, England
基金
英国工程与自然科学研究理事会;
关键词
Symbolic computation; Machine learning; Data augmentation; Classification; Regression; Cylindrical algebraic decomposition;
D O I
10.1007/s11786-024-00591-0
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Symbolic Computation algorithms and their implementation in computer algebra systems often contain choices which do not affect the correctness of the output but can significantly impact the resources required: such choices can benefit from having them made separately for each problem via a machine learning model. This study reports lessons on such use of machine learning in symbolic computation, in particular on the importance of analysing datasets prior to machine learning and on the different machine learning paradigms that may be utilised. We present results for a particular case study, the selection of variable ordering for cylindrical algebraic decomposition, but expect that the lessons learned are applicable to other decisions in symbolic computation. We utilise an existing dataset of examples derived from applications which was found to be imbalanced with respect to the variable ordering decision. We introduce an augmentation technique for polynomial systems problems that allows us to balance and further augment the dataset, improving the machine learning results by 28% and 38% on average, respectively. We then demonstrate how the existing machine learning methodology used for the problem-classification-might be recast into the regression paradigm. While this does not have a radical change on the performance, it does widen the scope in which the methodology can be applied to make choices.
引用
收藏
页数:27
相关论文
共 50 条
  • [21] A Comparative Study on Contemporary Intrusion Detection Datasets for Machine Learning Research
    Dwibedi, Smirti
    Pujari, Medha
    Sun, Weiqing
    2020 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS (ISI), 2020, : 123 - 128
  • [22] Adversarial Machine Learning: A Comparative Study on Contemporary Intrusion Detection Datasets
    Pacheco, Yulexis
    Sun, Weiqing
    ICISSP: PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS SECURITY AND PRIVACY, 2021, : 160 - 171
  • [23] Management of technology, corporate culture and educational paradigms: lessons from a case study
    Manders, Andre J.C.
    The International journal of human factors in manufacturing, 1993, 3 (03): : 231 - 242
  • [24] GPU-based similarity metrics computation and machine learning approaches for string similarity evaluation in large datasets
    Baloi, Aurel
    Belean, Bogdan
    Turcu, Flaviu
    Peptenatu, Daniel
    SOFT COMPUTING, 2024, 28 (04) : 3465 - 3477
  • [25] GPU-based similarity metrics computation and machine learning approaches for string similarity evaluation in large datasets
    Aurel Baloi
    Bogdan Belean
    Flaviu Turcu
    Daniel Peptenatu
    Soft Computing, 2024, 28 : 3465 - 3477
  • [26] Machine learning methods for cyber security intrusion detection: Datasets and comparative study
    Kilincer, Ilhan Firat
    Ertam, Fatih
    Sengur, Abdulkadir
    COMPUTER NETWORKS, 2021, 188
  • [27] Machine learning for encrypted malicious traffic detection: Approaches, datasets and comparative study
    Wang, Zihao
    Fok, Kar Wai
    Thing, Vrizlynn L. L.
    COMPUTERS & SECURITY, 2022, 113
  • [28] Case Study on Model-based Application of Machine Learning using Small CAD Databases for Cost Estimation
    Boerzel, Stefan
    Frochte, Joerg
    KDIR: PROCEEDINGS OF THE 11TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT - VOL 1: KDIR, 2019, : 258 - 265
  • [29] Is handling unbalanced datasets for machine learning uplifts system performance?: A case of diabetic prediction
    Narwane, Swati V.
    Sawarkar, Sudhir D.
    DIABETES & METABOLIC SYNDROME-CLINICAL RESEARCH & REVIEWS, 2022, 16 (09)
  • [30] Transferability of machine learning models learned from public intrusion detection datasets: the CICIDS2017 case study
    Marta Catillo
    Andrea Del Vecchio
    Antonio Pecchia
    Umberto Villano
    Software Quality Journal, 2022, 30 : 955 - 981