Surrogate- and invariance-boosted contrastive learning for data-scarce applications in science

被引:9
|
作者
Loh, Charlotte [1 ]
Christensen, Thomas [2 ]
Dangovski, Rumen [1 ]
Kim, Samuel [1 ]
Soljacic, Marin [2 ]
机构
[1] MIT, Dept Elect Engn & Comp Sci, Cambridge, MA 02139 USA
[2] MIT, Dept Phys, Cambridge, MA 02139 USA
基金
美国国家科学基金会;
关键词
MULTIPLE-SCATTERING THEORY;
D O I
10.1038/s41467-022-31915-y
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Deep learning techniques usually require a large quantity of training data and may be challenging for scarce datasets. The authors propose a framework that involves contrastive and transfer learning and reduces data requirements for training while keeping the prediction accuracy. Deep learning techniques have been increasingly applied to the natural sciences, e.g., for property prediction and optimization or material discovery. A fundamental ingredient of such approaches is the vast quantity of labeled data needed to train the model. This poses severe challenges in data-scarce settings where obtaining labels requires substantial computational or labor resources. Noting that problems in natural sciences often benefit from easily obtainable auxiliary information sources, we introduce surrogate- and invariance-boosted contrastive learning (SIB-CL), a deep learning framework which incorporates three inexpensive and easily obtainable auxiliary information sources to overcome data scarcity. Specifically, these are: abundant unlabeled data, prior knowledge of symmetries or invariances, and surrogate data obtained at near-zero cost. We demonstrate SIB-CL's effectiveness and generality on various scientific problems, e.g., predicting the density-of-states of 2D photonic crystals and solving the 3D time-independent Schrodinger equation. SIB-CL consistently results in orders of magnitude reduction in the number of labels needed to achieve the same network accuracies.
引用
收藏
页数:12
相关论文
共 50 条
  • [11] Integrating Physical and Machine Learning Models for Enhanced Landslide Prediction in Data-Scarce Environments
    Al-Najjar, Husam A. H.
    Pradhan, Biswajeet
    He, Xuzhen
    Sheng, Daichao
    Alamri, Abdullah
    Gite, Shilpa
    Park, Hyuck-Jin
    EARTH SYSTEMS AND ENVIRONMENT, 2024,
  • [12] Stream salinity prediction in data-scarce regions: Application of transfer learning and uncertainty quantification
    Khodkar, Kasra
    Mirchi, Ali
    Nourani, Vahid
    Kaghazchi, Afsaneh
    Sadler, Jeffrey M.
    Mansaray, Abubakarr
    Wagner, Kevin
    Alderman, Phillip D.
    Taghvaeian, Saleh
    Bailey, Ryan T.
    JOURNAL OF CONTAMINANT HYDROLOGY, 2024, 266
  • [13] Self-Learning Random Forests Model for Mapping Groundwater Yield in Data-Scarce Areas
    Maher Ibrahim Sameen
    Biswajeet Pradhan
    Saro Lee
    Natural Resources Research, 2019, 28 : 757 - 775
  • [14] Contrastive learning: Big Data Foundations and Applications
    Tripathi, Sandhya
    King, Christopher R.
    PROCEEDINGS OF 7TH JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MANAGEMENT OF DATA, CODS-COMAD 2024, 2024, : 493 - 497
  • [15] Transfer Learning in Landslide Susceptibility Mapping: Bridging Data-Rich and Data-Scarce Regions in the Northwestern Himalayas
    Singh, Ankit
    Dhiman, Nitesh
    Shukla, Dericks Praise
    IGARSS 2024-2024 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, IGARSS 2024, 2024, : 3253 - 3256
  • [16] Integrated hydrodynamic and machine learning models for compound flooding prediction in a data-scarce estuarine delta
    Sampurno, Joko
    Vallaeys, Valentin
    Ardianto, Randy
    Hanert, Emmanuel
    NONLINEAR PROCESSES IN GEOPHYSICS, 2022, 29 (03) : 301 - 315
  • [17] Development of a Distributed Physics-Informed Deep Learning Hydrological Model for Data-Scarce Regions
    Zhong, Liangjin
    Lei, Huimin
    Yang, Jingjing
    WATER RESOURCES RESEARCH, 2024, 60 (06)
  • [18] Enhancing the performance of runoff prediction in data-scarce hydrological domains using advanced transfer learning
    Chen, Songliang
    Mao, Qinglin
    Feng, Youcan
    Li, Hongyan
    Ma, Donghe
    Zhao, Yilian
    Liu, Junhui
    Cheng, Hui
    RESOURCES ENVIRONMENT AND SUSTAINABILITY, 2024, 18
  • [19] An Open Data and Citizen Science Approach to Building Resilience to Natural Hazards in a Data-Scarce Remote Mountainous Part of Nepal
    Parajuli, Binod Prasad
    Khadka, Prakash
    Baskota, Preshika
    Shakya, Puja
    Liu, Wei
    Pudasaini, Uttam
    Roniksh, B. C.
    Paul, Jonathan D.
    Buytaert, Wouter
    Vij, Sumit
    SUSTAINABILITY, 2020, 12 (22) : 1 - 13
  • [20] Comparing conceptual and super ensemble deep learning models for streamflow simulation in data-scarce catchments
    Wegayehu, Eyob Betru
    Muluneh, Fiseha Behulu
    JOURNAL OF HYDROLOGY-REGIONAL STUDIES, 2024, 52