Scientific machine learning benchmarks

被引:61
|
作者
Thiyagalingam, Jeyan [1 ]
Shankar, Mallikarjun [2 ]
Fox, Geoffrey [3 ]
Hey, Tony [1 ]
机构
[1] Sci & Technol Facil Council, Rutherford Appleton Lab, Harwell Campus, Didcot, Oxon, England
[2] Oak Ridge Natl Lab, Oak Ridge, TN USA
[3] Univ Virginia, Comp Sci & Biocomplex Inst, Charlottesville, VA USA
基金
英国工程与自然科学研究理事会;
关键词
40;
D O I
10.1038/s42254-022-00441-7
中图分类号
O59 [应用物理学];
学科分类号
摘要
Finding the most appropriate machine learning algorithm for the analysis of any given scientific dataset is currently challenging, but new machine learning benchmarks for science are being developed to help. Deep learning has transformed the use of machine learning technologies for the analysis of large experimental datasets. In science, such datasets are typically generated by large-scale experimental facilities, and machine learning focuses on the identification of patterns, trends and anomalies to extract meaningful scientific insights from the data. In upcoming experimental facilities, such as the Extreme Photonics Application Centre (EPAC) in the UK or the international Square Kilometre Array (SKA), the rate of data generation and the scale of data volumes will increasingly require the use of more automated data analysis. However, at present, identifying the most appropriate machine learning algorithm for the analysis of any given scientific dataset is a challenge due to the potential applicability of many different machine learning frameworks, computer architectures and machine learning models. Historically, for modelling and simulation on high-performance computing systems, these issues have been addressed through benchmarking computer applications, algorithms and architectures. Extending such a benchmarking approach and identifying metrics for the application of machine learning methods to open, curated scientific datasets is a new challenge for both scientists and computer scientists. Here, we introduce the concept of machine learning benchmarks for science and review existing approaches. As an example, we describe the SciMLBench suite of scientific machine learning benchmarks.
引用
收藏
页码:413 / 420
页数:8
相关论文
共 50 条
  • [31] A Performance Characterization of Scientific Machine Learning Workflows
    Krawczuk, Patrycja
    Papadimitriou, George
    Tanaka, Ryan
    Do, Tu Mai Anh
    Subramanya, Srujana
    Nagarkar, Shubham
    Jain, Aditi
    Lam, Kelsie
    Mandal, Anirban
    Pottier, Loic
    Deelman, Ewa
    PROCEEDINGS OF 16TH WORKSHOP ON WORKFLOWS IN SUPPORT OF LARGE-SCALE SCIENCE (WORKS21), 2021, : 58 - 65
  • [32] A Machine Learning Gateway for Scientific Workflow Design
    Broll, Brian
    Timalsina, Umesh
    Volgyesi, Peter
    Budavari, Tamas
    Ledeczi, Akos
    SCIENTIFIC PROGRAMMING, 2020, 2020
  • [33] Causal scientific explanations from machine learning
    Buijsman, Stefan
    SYNTHESE, 2023, 202 (06)
  • [34] Forecasting benchmarks of long-term stock returns via machine learning
    Ioannis Kyriakou
    Parastoo Mousavi
    Jens Perch Nielsen
    Michael Scholz
    Annals of Operations Research, 2021, 297 : 221 - 240
  • [35] BIKED: A DATASET AND MACHINE LEARNING BENCHMARKS FOR DATA-DRIVEN BICYCLE DESIGN
    Regenwetter, Lyle
    Curry, Brent
    Ahmed, Faez
    PROCEEDINGS OF ASME 2021 INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, IDETC-CIE2021, VOL 3A, 2021,
  • [36] Spectroscopic Benchmarks by Machine Learning as Discriminant Analysis for Unconventional Italian Pictorialism Photography
    Scatigno, Claudia
    Teodonio, Lorenzo
    Di Rocco, Eugenia
    Festa, Giulia
    POLYMERS, 2024, 16 (13)
  • [37] Interpretable models for extrapolation in scientific machine learning
    Muckley, Eric S.
    Saal, James E.
    Meredig, Bryce
    Roper, Christopher S.
    Martin, John H.
    DIGITAL DISCOVERY, 2023, 2 (05): : 1425 - 1435
  • [38] Causal scientific explanations from machine learning
    Stefan Buijsman
    Synthese, 202
  • [39] Workflow provenance in the lifecycle of scientific machine learning
    Souza, Renan
    Azevedo, Leonardo G.
    Lourenco, Vitor
    Soares, Elton
    Thiago, Raphael
    Brandao, Rafael
    Civitarese, Daniel
    Brazil, Emilio Vital
    Moreno, Marcio
    Valduriez, Patrick
    Mattoso, Marta
    Cerqueira, Renato
    Netto, Marco A. S.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (14):
  • [40] PDEBENCH: An Extensive Benchmark for Scientific Machine Learning
    Takamoto, Makoto
    Praditia, Timothy
    Leiteritz, Raphael
    MacKinlay, Dan
    Alesiani, Francesco
    Pflueger, Dirk
    Niepert, Mathias
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,