MULTIFIDELITY LINEAR REGRESSION FOR SCIENTIFIC MACHINE LEARNING FROM SCARCE DATA

被引:0
|
作者
Qian, Elizabeth [1 ,2 ]
Kang, Dayoung [1 ]
Sella, Vignesh [3 ]
Chaudhuri, Anirban [3 ]
机构
[1] Georgia Inst Technol, Sch Aerosp Engn, Atlanta, GA 30332 USA
[2] Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA
[3] Univ Texas Austin, Oden Inst Computat Engn & Sci, Austin, TX USA
关键词
Multifidelity methods; scientific machine learning; multifidelity machine learning; control variates; SPECTRAL PROPERTIES; OPERATOR INFERENCE; MODEL-REDUCTION; MULTILEVEL; APPROXIMATION; NETWORKS;
D O I
10.3934/fods.2024049
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Machine learning (ML) methods, which fit data to the parameters of a given parameterized model class, have garnered significant interest as potential methods for learning surrogate models for complex engineering systems where traditional simulation is expensive. However, in many scientific and engineering settings, generating high-fidelity data to train ML models is expensive, and the available budget for generating training data is limited, making high-fidelity training data scarce. ML models trained on scarce data have high variance, resulting in poor expected generalization performance. We propose a new multifidelity training approach for scientific machine learning via linear regression that exploits the scientific context where data of varying fidelities and costs are available; for example, high-fidelity data may be generated by an expensive fully resolved physics simulation whereas lower-fidelity data may arise from a cheaper model based on simplifying assumptions. We use the multifidelity data within an approximate control variate framework to define new multifidelity Monte Carlo estimators for linear regression models. We provide bias and variance analysis of our new estimators that guarantee the approach's accuracy and improved robustness to scarce high-fidelity data. Numerical results demonstrate that our multifidelity training approach achieves similar accuracy to the standard high-fidelity-only approach, significantly reducing high-fidelity data requirements.
引用
收藏
页数:27
相关论文
共 50 条
  • [21] Multifidelity Statistical Machine Learning for Molecular Crystal Structure Prediction
    Egorova, Olga
    Hafizi, Roohollah
    Woods, David C.
    Day, Graeme M.
    JOURNAL OF PHYSICAL CHEMISTRY A, 2020, 124 (39): : 8065 - 8078
  • [22] Dynamic System Identification from Scarce and Noisy Data using Symbolic Regression
    Cohen, Benjamin
    Beykal, Burcu
    Bollas, George
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 3670 - 3675
  • [23] NMF-BASED KEYWORD LEARNING FROM SCARCE DATA
    Ons, Bart
    Gemmeke, Jort F.
    Van Hamme, Hugo
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 392 - 397
  • [24] Comparison of multifidelity machine learning models for potential energy surfaces
    Goodlett, Stephen M.
    Turney, Justin M.
    Schaefer, Henry F.
    JOURNAL OF CHEMICAL PHYSICS, 2023, 159 (04):
  • [25] A framework for data regression of heat transfer data using machine learning
    Loyola-Fuentes, Jose
    Nazemzadeh, Nima
    Diaz-Bejarano, Emilio
    Mancin, Simone
    Coletti, Francesco
    APPLIED THERMAL ENGINEERING, 2024, 248
  • [26] Linear regression-based multifidelity surrogate for disturbance amplification in multiphase explosion
    Giselle Fernandez-Godino, M.
    Dubreuil, Sylvain
    Bartoli, Nathalie
    Gogu, Christian
    Balachandar, S.
    Haftka, Raphael T.
    STRUCTURAL AND MULTIDISCIPLINARY OPTIMIZATION, 2019, 60 (06) : 2205 - 2220
  • [27] Deep Learning for Multifidelity Aerodynamic Distribution Modeling from Experimental and Simulation Data
    Li, Kai
    Kou, Jiaqing
    Zhang, Weiwei
    AIAA JOURNAL, 2022, 60 (07) : 4413 - 4427
  • [28] Machine Learning-Driven Event Characterization under Scarce Vehicular Sensing Data
    Taherifard, Nima
    Simsek, Murat
    Lascelles, Charles
    Kantarci, Burak
    2020 IEEE 25TH INTERNATIONAL WORKSHOP ON COMPUTER AIDED MODELING AND DESIGN OF COMMUNICATION LINKS AND NETWORKS (CAMAD), 2020,
  • [29] Forecasting accuracy of machine learning and linear regression: evidence from the secondary CAT bond market
    Götze T.
    Gürtler M.
    Witowski E.
    Journal of Business Economics, 2023, 93 (9) : 1629 - 1660
  • [30] An Improved Anticipated Learning Machine for Daily Runoff Prediction in Data-Scarce Regions
    Hu, Wei
    Qian, Longxia
    Hong, Mei
    Zhao, Yong
    Fan, Linlin
    MATHEMATICAL GEOSCIENCES, 2025, 57 (01) : 49 - 88