Data Imbalance, Uncertainty Quantification, and Transfer Learning in Data-Driven Parameterizations: Lessons From the Emulation of Gravity Wave Momentum Transport in WACCM

被引:2
|
作者
Sun, Y. Qiang [1 ,2 ]
Pahlavan, Hamid A. [1 ,3 ]
Chattopadhyay, Ashesh [1 ,4 ]
Hassanzadeh, Pedram [1 ,2 ]
Lubis, Sandro W. [1 ,5 ]
Alexander, M. Joan [3 ]
Gerber, Edwin P. [6 ]
Sheshadri, Aditi [7 ]
Guan, Yifei [1 ,2 ]
机构
[1] Rice Univ, Houston, TX 77005 USA
[2] Univ Chicago, Chicago, IL 60637 USA
[3] NorthWest Res Associates, Boulder, CO USA
[4] Univ Calif Santa Cruz, Santa Cruz, CA USA
[5] Pacific Northwest Natl Lab, Richland, WA USA
[6] NYU, New York, NY USA
[7] Stanford Univ, Palo Alto, CA USA
关键词
data-driven parameterizations; data imbalance; uncertainty quantification; generalization via transfer learning; gravity wave momentum transport; CLIMATE; PARAMETRIZATION; MODEL; REPRESENTATION; INTERMITTENCY; TURBULENCE; WEATHER; PHYSICS; EVENTS; FLUX;
D O I
10.1029/2023MS004145
中图分类号
P4 [大气科学(气象学)];
学科分类号
0706 ; 070601 ;
摘要
Neural networks (NNs) are increasingly used for data-driven subgrid-scale parameterizations in weather and climate models. While NNs are powerful tools for learning complex non-linear relationships from data, there are several challenges in using them for parameterizations. Three of these challenges are (a) data imbalance related to learning rare, often large-amplitude, samples; (b) uncertainty quantification (UQ) of the predictions to provide an accuracy indicator; and (c) generalization to other climates, for example, those with different radiative forcings. Here, we examine the performance of methods for addressing these challenges using NN-based emulators of the Whole Atmosphere Community Climate Model (WACCM) physics-based gravity wave (GW) parameterizations as a test case. WACCM has complex, state-of-the-art parameterizations for orography-, convection-, and front-driven GWs. Convection- and orography-driven GWs have significant data imbalance due to the absence of convection or orography in most grid points. We address data imbalance using resampling and/or weighted loss functions, enabling the successful emulation of parameterizations for all three sources. We demonstrate that three UQ methods (Bayesian NNs, variational auto-encoders, and dropouts) provide ensemble spreads that correspond to accuracy during testing, offering criteria for identifying when an NN gives inaccurate predictions. Finally, we show that the accuracy of these NNs decreases for a warmer climate (4 x CO2). However, their performance is significantly improved by applying transfer learning, for example, re-training only one layer using similar to 1% new data from the warmer climate. The findings of this study offer insights for developing reliable and generalizable data-driven parameterizations for various processes, including (but not limited to) GWs. Scientists increasingly use machine learning methods, especially neural networks (NNs), to improve weather and climate models. However, it can be challenging for an NN to learn rare, large-amplitude events because they are infrequent in training data. In addition, NNs need to express their confidence (certainty) about a prediction and work effectively across different climates, for example, warmer climates due to increased CO2. Traditional NNs often struggle with these challenges. Here, we share insights from emulating known physics (gravity waves) with NNs in a state-of-the-art climate model. We propose specific strategies for effectively learning rare events, quantifying the uncertainty of NN predictions, and making reliable predictions across various climates. For instance, one strategy to address the learning of rare events involves inflating the impact of infrequent events in the training data. We also demonstrate that several methods could be useful in determining the uncertainty of the predictions. Furthermore, we show that NNs trained on simulations of the historical period do not perform as well in warmer climates. We then improve NN performance by employing transfer learning using limited new data from warmer climates. This study provides lessons for developing robust and generalizable approaches for using NNs to improve models in the future. Whole Atmosphere Community Climate Model's orographic, convective, and frontal gravity wave parameterizations are emulated using neural nets to inform future modeling efforts Data imbalance is addressed via resampling and weighted loss; uncertainty quantification via Bayesian, dropout, and variational methods Performance of the neural nets in a warmer climate is improved via transfer learning with similar to 1% new data
引用
收藏
页数:27
相关论文
共 19 条
  • [1] Bayesian uncertainty quantification for data-driven equation learning
    Martina-Perez, Simon
    Simpson, Matthew J.
    Baker, Ruth E.
    PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2021, 477 (2254):
  • [2] Data-driven versus self-similar parameterizations for stochastic advection by Lie transport and location uncertainty
    Resseguier, Valentin
    Pan, Wei
    Fox-Kemper, Baylor
    NONLINEAR PROCESSES IN GEOPHYSICS, 2020, 27 (02) : 209 - 234
  • [3] Special Issue on Uncertainty Quantification, Machine Learning, and Data-Driven Modeling of Biological Systems
    Tepole, Adrian Buganza
    Nordsletten, David
    Garikipati, Krishna
    Kuhl, Ellen
    COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2020, 362
  • [4] Data-driven uncertainty quantification for predictive flow and transport modeling using support vector machines
    He, Jiachuan
    Mattis, Steven A.
    Butler, Troy D.
    Dawson, Clint N.
    COMPUTATIONAL GEOSCIENCES, 2019, 23 (04) : 631 - 645
  • [5] Data-driven uncertainty quantification for predictive flow and transport modeling using support vector machines
    Jiachuan He
    Steven A. Mattis
    Troy D. Butler
    Clint N. Dawson
    Computational Geosciences, 2019, 23 : 631 - 645
  • [6] Data-driven Uncertainty Quantification of the Wave Telescope Technique: General Equations and Demonstration Using HelioSwarm
    Broeren, T.
    Klein, K. G.
    ASTROPHYSICAL JOURNAL SUPPLEMENT SERIES, 2023, 266 (01):
  • [7] Uncertainty Quantification for Data-Driven Change-Point Learning via Cross-Validation
    Chen, Hui
    Jia, Yinxu
    Wang, Guanghui
    Zou, Changliang
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 10, 2024, : 11294 - 11301
  • [8] Bayesian Nonlocal Operator Regression: A Data-Driven Learning Framework of Nonlocal Models with Uncertainty Quantification
    Fan, Yiming
    D'Elia, Marta
    Yu, Yue
    Najm, Habib N.
    Silling, Stewart
    JOURNAL OF ENGINEERING MECHANICS, 2023, 149 (08)
  • [9] Quantifying 3D Gravity Wave Drag in a Library of Tropical Convection-Permitting Simulations for Data-Driven Parameterizations
    Sun, Y. Qiang
    Hassanzadeh, Pedram
    Alexander, M. Joan
    Kruse, Christopher G.
    JOURNAL OF ADVANCES IN MODELING EARTH SYSTEMS, 2023, 15 (05)
  • [10] The graft-versus-host problem for data-driven gravity-wave parameterizations in a one-dimensional quasibiennial oscillation model
    Shamir, Ofer
    Connelly, David S.
    Hardiman, Steven C.
    Shao, Zihan
    Yang, L. Minah
    Gerber, Edwin P.
    QUARTERLY JOURNAL OF THE ROYAL METEOROLOGICAL SOCIETY, 2024, 150 (761) : 2255 - 2272