MULTIFIDELITY LINEAR REGRESSION FOR SCIENTIFIC MACHINE LEARNING FROM SCARCE DATA

被引:0
|
作者
Qian, Elizabeth [1 ,2 ]
Kang, Dayoung [1 ]
Sella, Vignesh [3 ]
Chaudhuri, Anirban [3 ]
机构
[1] Georgia Inst Technol, Sch Aerosp Engn, Atlanta, GA 30332 USA
[2] Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA
[3] Univ Texas Austin, Oden Inst Computat Engn & Sci, Austin, TX USA
关键词
Multifidelity methods; scientific machine learning; multifidelity machine learning; control variates; SPECTRAL PROPERTIES; OPERATOR INFERENCE; MODEL-REDUCTION; MULTILEVEL; APPROXIMATION; NETWORKS;
D O I
10.3934/fods.2024049
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Machine learning (ML) methods, which fit data to the parameters of a given parameterized model class, have garnered significant interest as potential methods for learning surrogate models for complex engineering systems where traditional simulation is expensive. However, in many scientific and engineering settings, generating high-fidelity data to train ML models is expensive, and the available budget for generating training data is limited, making high-fidelity training data scarce. ML models trained on scarce data have high variance, resulting in poor expected generalization performance. We propose a new multifidelity training approach for scientific machine learning via linear regression that exploits the scientific context where data of varying fidelities and costs are available; for example, high-fidelity data may be generated by an expensive fully resolved physics simulation whereas lower-fidelity data may arise from a cheaper model based on simplifying assumptions. We use the multifidelity data within an approximate control variate framework to define new multifidelity Monte Carlo estimators for linear regression models. We provide bias and variance analysis of our new estimators that guarantee the approach's accuracy and improved robustness to scarce high-fidelity data. Numerical results demonstrate that our multifidelity training approach achieves similar accuracy to the standard high-fidelity-only approach, significantly reducing high-fidelity data requirements.
引用
收藏
页数:27
相关论文
共 50 条
  • [41] Scientific Inference: Learning from Data
    Myridis, Nikolaos E.
    CONTEMPORARY PHYSICS, 2014, 55 (04) : 339 - 340
  • [42] Linear Regression from Strategic Data Sources
    Gast, Nicolas
    Ioannidis, Stratis
    Loiseau, Patrick
    Roussillon, Benjamin
    ACM TRANSACTIONS ON ECONOMICS AND COMPUTATION, 2020, 8 (02)
  • [43] Anomaly Detection for Environmental Data Using Machine Learning Regression
    Yuan, Fuqing
    Lu, Jinmei
    6TH ANNUAL INTERNATIONAL CONFERENCE ON MATERIAL SCIENCE AND ENVIRONMENTAL ENGINEERING, 2019, 472
  • [44] PML: A parallel machine learning toolbox for data classification and regression
    Jing, Runyu
    Sun, Jing
    Wang, Yuelong
    Li, Menglong
    Pu, Xuemei
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2014, 138 : 1 - 6
  • [45] Sparse linear regression from perturbed data
    Fosson, Sophie M.
    Cerone, Vito
    Regruto, Diego
    AUTOMATICA, 2020, 122
  • [46] Data Augmentation for Regression Machine Learning Problems in High Dimensions
    Guilhaumon, Clara
    Hascoet, Nicolas
    Chinesta, Francisco
    Lavarde, Marc
    Daim, Fatima
    COMPUTATION, 2024, 12 (02)
  • [47] PML: A parallel machine learning toolbox for data classification and regression
    Jing, Runyu
    Sun, Jing
    Wang, Yuelong
    Li, Menglong
    Pu, Xuemei
    Chemometrics and Intelligent Laboratory Systems, 2014, 138 : 1 - 6
  • [48] PML: A parallel machine learning toolbox for data classification and regression
    Jing, Runyu
    Sun, Jing
    Wang, Yuelong
    Li, Menglong
    Pu, Xuemei
    Chemometrics and Intelligent Laboratory Systems, 2014, 138 : 1 - 6
  • [49] Multivariable prediction model of complications derived from diabetes mellitus using machine learning on scarce highly unbalanced data
    Colmenares-Mejia, Claudia C.
    Rincon-Acuna, Juan C.
    Cely, Andres
    Gonzalez-Velez, Abel E.
    Castillo, Andrea
    Murcia, Jossie
    Isaza-Ruget, Mario A.
    INTERNATIONAL JOURNAL OF DIABETES IN DEVELOPING COUNTRIES, 2024, 44 (03) : 528 - 538
  • [50] Physics-informed learning of governing equations from scarce data
    Zhao Chen
    Yang Liu
    Hao Sun
    Nature Communications, 12