Lessons on Datasets and Paradigms in Machine Learning for Symbolic Computation: A Case Study on CAD

被引：0

作者：

del Rio, Tereso ^{[1
]}

England, Matthew ^{[1
]}

机构：

[1] Coventry Univ, Coventry, England

来源：

MATHEMATICS IN COMPUTER SCIENCE | 2024年 / 18卷 / 03期

基金：

英国工程与自然科学研究理事会;

关键词：

Symbolic computation; Machine learning; Data augmentation; Classification; Regression; Cylindrical algebraic decomposition;

D O I：

10.1007/s11786-024-00591-0

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

Symbolic Computation algorithms and their implementation in computer algebra systems often contain choices which do not affect the correctness of the output but can significantly impact the resources required: such choices can benefit from having them made separately for each problem via a machine learning model. This study reports lessons on such use of machine learning in symbolic computation, in particular on the importance of analysing datasets prior to machine learning and on the different machine learning paradigms that may be utilised. We present results for a particular case study, the selection of variable ordering for cylindrical algebraic decomposition, but expect that the lessons learned are applicable to other decisions in symbolic computation. We utilise an existing dataset of examples derived from applications which was found to be imbalanced with respect to the variable ordering decision. We introduce an augmentation technique for polynomial systems problems that allows us to balance and further augment the dataset, improving the machine learning results by 28% and 38% on average, respectively. We then demonstrate how the existing machine learning methodology used for the problem-classification-might be recast into the regression paradigm. While this does not have a radical change on the performance, it does widen the scope in which the methodology can be applied to make choices.

引用

页数：27

共 50 条

[21] A Comparative Study on Contemporary Intrusion Detection Datasets for Machine Learning Research
Dwibedi, Smirti
Pujari, Medha
Sun, Weiqing
2020 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS (ISI), 2020, : 123 - 128
[22] Adversarial Machine Learning: A Comparative Study on Contemporary Intrusion Detection Datasets
Pacheco, Yulexis
Sun, Weiqing
ICISSP: PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS SECURITY AND PRIVACY, 2021, : 160 - 171
[23] Management of technology, corporate culture and educational paradigms: lessons from a case study
Manders, Andre J.C.
The International journal of human factors in manufacturing, 1993, 3 (03): : 231 - 242
[24] GPU-based similarity metrics computation and machine learning approaches for string similarity evaluation in large datasets
Baloi, Aurel
Belean, Bogdan
Turcu, Flaviu
Peptenatu, Daniel
SOFT COMPUTING, 2024, 28 (04) : 3465 - 3477
[25] GPU-based similarity metrics computation and machine learning approaches for string similarity evaluation in large datasets
Aurel Baloi
Bogdan Belean
Flaviu Turcu
Daniel Peptenatu
Soft Computing, 2024, 28 : 3465 - 3477
[26] Machine learning methods for cyber security intrusion detection: Datasets and comparative study
Kilincer, Ilhan Firat
Ertam, Fatih
Sengur, Abdulkadir
COMPUTER NETWORKS, 2021, 188
[27] Machine learning for encrypted malicious traffic detection: Approaches, datasets and comparative study
Wang, Zihao
Fok, Kar Wai
Thing, Vrizlynn L. L.
COMPUTERS & SECURITY, 2022, 113
[28] Case Study on Model-based Application of Machine Learning using Small CAD Databases for Cost Estimation
Boerzel, Stefan
Frochte, Joerg
KDIR: PROCEEDINGS OF THE 11TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT - VOL 1: KDIR, 2019, : 258 - 265
[29] Is handling unbalanced datasets for machine learning uplifts system performance?: A case of diabetic prediction
Narwane, Swati V.
Sawarkar, Sudhir D.
DIABETES & METABOLIC SYNDROME-CLINICAL RESEARCH & REVIEWS, 2022, 16 (09)
[30] Transferability of machine learning models learned from public intrusion detection datasets: the CICIDS2017 case study
Marta Catillo
Andrea Del Vecchio
Antonio Pecchia
Umberto Villano
Software Quality Journal, 2022, 30 : 955 - 981

← 1 2 3 4 5 →