Surrogate- and invariance-boosted contrastive learning for data-scarce applications in science

被引:9
|
作者
Loh, Charlotte [1 ]
Christensen, Thomas [2 ]
Dangovski, Rumen [1 ]
Kim, Samuel [1 ]
Soljacic, Marin [2 ]
机构
[1] MIT, Dept Elect Engn & Comp Sci, Cambridge, MA 02139 USA
[2] MIT, Dept Phys, Cambridge, MA 02139 USA
基金
美国国家科学基金会;
关键词
MULTIPLE-SCATTERING THEORY;
D O I
10.1038/s41467-022-31915-y
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Deep learning techniques usually require a large quantity of training data and may be challenging for scarce datasets. The authors propose a framework that involves contrastive and transfer learning and reduces data requirements for training while keeping the prediction accuracy. Deep learning techniques have been increasingly applied to the natural sciences, e.g., for property prediction and optimization or material discovery. A fundamental ingredient of such approaches is the vast quantity of labeled data needed to train the model. This poses severe challenges in data-scarce settings where obtaining labels requires substantial computational or labor resources. Noting that problems in natural sciences often benefit from easily obtainable auxiliary information sources, we introduce surrogate- and invariance-boosted contrastive learning (SIB-CL), a deep learning framework which incorporates three inexpensive and easily obtainable auxiliary information sources to overcome data scarcity. Specifically, these are: abundant unlabeled data, prior knowledge of symmetries or invariances, and surrogate data obtained at near-zero cost. We demonstrate SIB-CL's effectiveness and generality on various scientific problems, e.g., predicting the density-of-states of 2D photonic crystals and solving the 3D time-independent Schrodinger equation. SIB-CL consistently results in orders of magnitude reduction in the number of labels needed to achieve the same network accuracies.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Analyses of groundwater level in a data-scarce region based on assessed precipitation products and machine learning
    El-Azhari, Ahmed
    Karaoui, Ismail
    Brahim, Yassine Ait
    Azhar, Mohamed
    Chehbouni, Abdelghani
    Bouchaou, Lhoussaine
    GROUNDWATER FOR SUSTAINABLE DEVELOPMENT, 2024, 26
  • [22] Identification of groundwater potential zones in data-scarce mountainous region using explainable machine learning
    Dahal, Kshitij
    Sharma, Sandesh
    Shakya, Amin
    Talchabhadel, Rocky
    Adhikari, Sanot
    Pokharel, Anju
    Sheng, Zhuping
    Pradhan, Ananta Man Singh
    Kumar, Saurav
    JOURNAL OF HYDROLOGY, 2023, 627
  • [23] Self-Learning Random Forests Model for Mapping Groundwater Yield in Data-Scarce Areas
    Sameen, Maher Ibrahim
    Pradhan, Biswajeet
    Lee, Saro
    NATURAL RESOURCES RESEARCH, 2019, 28 (03) : 757 - 775
  • [24] Engineering geological mapping procedures in data-scarce hillsides for shallow landslide assessments: Applications in southeastern Brazil
    da Silva, Aline Freitas
    Talamini, Adriana Ahrendt
    Zuquette, Lazaro Valentim
    JOURNAL OF SOUTH AMERICAN EARTH SCIENCES, 2021, 111
  • [25] Deep learning approaches for improving prediction of daily stream temperature in data-scarce, unmonitored, and dammed basins
    Rahmani, Farshid
    Shen, Chaopeng
    Oliver, Samantha
    Lawson, Kathryn
    Appling, Alison
    HYDROLOGICAL PROCESSES, 2021, 35 (11)
  • [26] Deep learning algorithms to develop Flood susceptibility map in Data-Scarce and Ungauged River Basin in India
    Sunil Saha
    Amiya Gayen
    Bijoy Bayen
    Stochastic Environmental Research and Risk Assessment, 2022, 36 : 3295 - 3310
  • [27] Enhanced landslide susceptibility mapping in data-scarce regions via unsupervised few-shot learning
    Kong, Linghao
    Feng, Wenkai
    Yi, Xiaoyu
    Xue, Zhenghai
    Bai, Luyao
    GONDWANA RESEARCH, 2025, 138 : 31 - 46
  • [28] The Potential of Deep Learning for Satellite Rainfall Detection over Data-Scarce Regions, the West African Savanna
    Estebanez-Camarena, Monica
    Taormina, Riccardo
    van de Giesen, Nick
    ten Veldhuis, Marie-Claire
    REMOTE SENSING, 2023, 15 (07)
  • [29] Evaluating satellite-based evapotranspiration estimates for hydrological applications in data-scarce regions: A case in Ethiopia
    Dile, Yihun T.
    Ayana, Essayas K.
    Worqlul, Abeyou W.
    Xie, Hua
    Srinivasan, R.
    Lefore, Nicole
    You, Liangzhi
    Clarke, Neville
    SCIENCE OF THE TOTAL ENVIRONMENT, 2020, 743
  • [30] Streamflow Simulation in Data-Scarce Basins Using Bayesian and Physics-Informed Machine Learning Models
    Lu, Dan
    Konapala, Goutam
    Painter, Scott L.
    Kao, Shih-Chieh
    Gangrade, Sudershan
    JOURNAL OF HYDROMETEOROLOGY, 2021, 22 (06) : 1421 - 1438