MULTIFIDELITY LINEAR REGRESSION FOR SCIENTIFIC MACHINE LEARNING FROM SCARCE DATA

被引:0
|
作者
Qian, Elizabeth [1 ,2 ]
Kang, Dayoung [1 ]
Sella, Vignesh [3 ]
Chaudhuri, Anirban [3 ]
机构
[1] Georgia Inst Technol, Sch Aerosp Engn, Atlanta, GA 30332 USA
[2] Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA
[3] Univ Texas Austin, Oden Inst Computat Engn & Sci, Austin, TX USA
关键词
Multifidelity methods; scientific machine learning; multifidelity machine learning; control variates; SPECTRAL PROPERTIES; OPERATOR INFERENCE; MODEL-REDUCTION; MULTILEVEL; APPROXIMATION; NETWORKS;
D O I
10.3934/fods.2024049
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Machine learning (ML) methods, which fit data to the parameters of a given parameterized model class, have garnered significant interest as potential methods for learning surrogate models for complex engineering systems where traditional simulation is expensive. However, in many scientific and engineering settings, generating high-fidelity data to train ML models is expensive, and the available budget for generating training data is limited, making high-fidelity training data scarce. ML models trained on scarce data have high variance, resulting in poor expected generalization performance. We propose a new multifidelity training approach for scientific machine learning via linear regression that exploits the scientific context where data of varying fidelities and costs are available; for example, high-fidelity data may be generated by an expensive fully resolved physics simulation whereas lower-fidelity data may arise from a cheaper model based on simplifying assumptions. We use the multifidelity data within an approximate control variate framework to define new multifidelity Monte Carlo estimators for linear regression models. We provide bias and variance analysis of our new estimators that guarantee the approach's accuracy and improved robustness to scarce high-fidelity data. Numerical results demonstrate that our multifidelity training approach achieves similar accuracy to the standard high-fidelity-only approach, significantly reducing high-fidelity data requirements.
引用
收藏
页数:27
相关论文
共 50 条
  • [31] Simulating the climate driven runoff in data-scarce mountains by machine learning and downscaling reanalysis data
    Fan, Mengtian
    Xu, Jianhua
    Chen, Yaning
    Li, Weihong
    STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT, 2022, 36 (11) : 3819 - 3834
  • [32] Simulating the climate driven runoff in data-scarce mountains by machine learning and downscaling reanalysis data
    Mengtian Fan
    Jianhua Xu
    Yaning Chen
    Weihong Li
    Stochastic Environmental Research and Risk Assessment, 2022, 36 : 3819 - 3834
  • [33] Causal scientific explanations from machine learning
    Buijsman, Stefan
    SYNTHESE, 2023, 202 (06)
  • [34] Calibrating DFT Formation Enthalpy Calculations by Multifidelity Machine Learning
    Gong, Sheng
    Wang, Shuo
    Xie, Tian
    Chae, Woo Hyun
    Liu, Runze
    Shao-Horn, Yang
    Grossman, Jeffrey C.
    JACS AU, 2022, 2 (09): : 1964 - 1977
  • [35] Causal scientific explanations from machine learning
    Stefan Buijsman
    Synthese, 202
  • [36] Linear regression-based multifidelity surrogate for disturbance amplification in multiphase explosion
    M. Giselle Fernández-Godino
    Sylvain Dubreuil
    Nathalie Bartoli
    Christian Gogu
    S. Balachandar
    Raphael T. Haftka
    Structural and Multidisciplinary Optimization, 2019, 60 : 2205 - 2220
  • [37] Transfer Learning for Flow Reconstruction Based on Multifidelity Data
    Kou, Jiaqing
    Ning, Chenjia
    Zhang, Weiwei
    AIAA JOURNAL, 2022, 60 (10) : 5821 - 5842
  • [38] Relaxing Assumptions, Improving Inference: Integrating Machine Learning and the Linear Regression
    Ratkovic, Marc
    AMERICAN POLITICAL SCIENCE REVIEW, 2023, 117 (03) : 1053 - 1069
  • [39] Linear Regression Based Machine Learning Model for Cataract Disease Prediction
    Premalatha B.
    Nataraj C.
    EAI Endorsed Transactions on Pervasive Health and Technology, 2024, 10
  • [40] Machine learning outperforms linear regression for predicting disability progression in SPMS
    Law, M. T. K.
    Traboulsee, A. L.
    Li, D. K. B.
    Carruthers, R. L.
    Freedman, M. S.
    Kolind, S. H.
    Tam, R.
    MULTIPLE SCLEROSIS JOURNAL, 2018, 24 : 1025 - 1025