Surrogate- and invariance-boosted contrastive learning for data-scarce applications in science

被引:9
|
作者
Loh, Charlotte [1 ]
Christensen, Thomas [2 ]
Dangovski, Rumen [1 ]
Kim, Samuel [1 ]
Soljacic, Marin [2 ]
机构
[1] MIT, Dept Elect Engn & Comp Sci, Cambridge, MA 02139 USA
[2] MIT, Dept Phys, Cambridge, MA 02139 USA
基金
美国国家科学基金会;
关键词
MULTIPLE-SCATTERING THEORY;
D O I
10.1038/s41467-022-31915-y
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Deep learning techniques usually require a large quantity of training data and may be challenging for scarce datasets. The authors propose a framework that involves contrastive and transfer learning and reduces data requirements for training while keeping the prediction accuracy. Deep learning techniques have been increasingly applied to the natural sciences, e.g., for property prediction and optimization or material discovery. A fundamental ingredient of such approaches is the vast quantity of labeled data needed to train the model. This poses severe challenges in data-scarce settings where obtaining labels requires substantial computational or labor resources. Noting that problems in natural sciences often benefit from easily obtainable auxiliary information sources, we introduce surrogate- and invariance-boosted contrastive learning (SIB-CL), a deep learning framework which incorporates three inexpensive and easily obtainable auxiliary information sources to overcome data scarcity. Specifically, these are: abundant unlabeled data, prior knowledge of symmetries or invariances, and surrogate data obtained at near-zero cost. We demonstrate SIB-CL's effectiveness and generality on various scientific problems, e.g., predicting the density-of-states of 2D photonic crystals and solving the 3D time-independent Schrodinger equation. SIB-CL consistently results in orders of magnitude reduction in the number of labels needed to achieve the same network accuracies.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] An Integrated Transfer Learning Method for Power Generation Prediction of Run-Off Small Hydropower in Data-Scarce Areas
    Wei, Zetao
    Shen, Xiaodong
    Qiu, Gao
    Liu, Youbo
    Liu, Junyong
    IEEE TRANSACTIONS ON SMART GRID, 2024, 15 (01) : 1030 - 1041
  • [42] Data-centric or algorithm-centric: Exploiting the performance of transfer learning for improving building energy predictions in data-scarce context
    Fan, Cheng
    Lei, Yutian
    Sun, Yongjun
    Piscitelli, Marco Savino
    Chiosa, Roberto
    Capozzoli, Alfonso
    ENERGY, 2022, 240
  • [43] Improvement of streamflow simulation by combining physically hydrological model with deep learning methods in data-scarce glacial river basin
    Yang, Chengde
    Xu, Min
    Kang, Shichang
    Fu, Congsheng
    Hu, Didi
    JOURNAL OF HYDROLOGY, 2023, 625
  • [44] Optimizing deep reinforcement learning in data-scarce domains: a cross-domain evaluation of double DQN and dueling DQN
    Din, Nusrat Mohi Ud
    Assad, Assif
    Ul Sabha, Saqib
    Rasool, Muzafar
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024,
  • [45] Automatic gap-filling of daily streamflow time series in data-scarce regions using a machine learning algorithm
    Arriagada, Pedro
    Karelovic, Bruno
    Link, Oscar
    JOURNAL OF HYDROLOGY, 2021, 598
  • [46] Transfer learning framework for streamflow prediction in large-scale transboundary catchments:Sensitivity analysis and applicability in data-scarce basins
    MA Kai
    SHEN Chaopeng
    XU Ziyue
    HE Daming
    Journal of Geographical Sciences, 2024, 34 (05) : 963 - 988
  • [47] Integrating machine learning and zoning-based techniques for bias correction in gridded precipitation data to improve hydrological estimation in the data-scarce region
    Meema, Thatkiat
    Wattanasetpong, Jatuwat
    Wichakul, Supattana
    JOURNAL OF HYDROLOGY, 2025, 646
  • [48] Detecting springs and groundwater-dependent vegetation in data-scarce regions of Australia combining citizen science, GRACE, and optical/ radar imagery
    Castellazzi, Pascal
    Gao, Sicong
    Pritchard, Jodie
    Ponce-Reyes, Rocio
    Stratford, Danial
    Crosbie, Russell
    REMOTE SENSING OF ENVIRONMENT, 2024, 313
  • [49] Transfer learning framework for streamflow prediction in large-scale transboundary catchments: Sensitivity analysis and applicability in data-scarce basins
    Ma, Kai
    Shen, Chaopeng
    Xu, Ziyue
    He, Daming
    JOURNAL OF GEOGRAPHICAL SCIENCES, 2024, 34 (05) : 963 - 984
  • [50] Groundwater level forecasting in a data-scarce region through remote sensing data downscaling, hydrological modeling, and machine learning: A case study from Morocco
    Rafik, Abdellatif
    Brahim, Yassine Ait
    Amazirh, Abdelhakim
    Ouarani, Mohamed
    Bargam, Bouchra
    Ouatiki, Hamza
    Bouslihim, Yassine
    Bouchaou, Lhoussaine
    Chehbouni, Abdelghani
    JOURNAL OF HYDROLOGY-REGIONAL STUDIES, 2023, 50